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ABSTRACT 

Allocation  of  resources  in  “next-generation”  real-time  operating  systems  requires 
some  important  features  in  addition  to  those  demonstrated  by  current  systems,  resuiting 
in  an  increased  complexity  of  each  system.  The  allocation  is  closely  related  to  the 
scheduling,  and  the  two  are  based  on  time  considerations,  rather  then  on  a  static  priori¬ 
ty  scheme.  The  allocation  is  fault  tolerance  motivated,  to  cope  with  the  application’s 
reliability  goals.  Di,  tributed  system  issues  and  adaptive  behavior  requirements  in¬ 
crease  the  complexity  and  significance  of  the  allocation  approach. 

The  allocation  scheme  wec  propose^iere  accomplishes  the  hard  real-time  goal  of 
guaranteeing  a  deadline  satisfaction  in  case  the  job  is  accepted.  In  addition,  this  allo¬ 
cation  scheme  supports  fault  tolerance  objectives  in  both  damage  containment  and  resi¬ 
liency  requirements.  It  does  this  in  cooperation  with  a  schedulability  verification 
mechanism,  and  with  an  object  architecture  in  which  for  each  object  there  exists  a 
calendar  that  maintains  the  time  of  its  execution.  A  nice  feature  of  this  scheme  is  the 
way  in  which  it  can  be  used  for  reallocation  while  increasing  the  resiliency. 
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1  Introduction 


This  paper  examines  the  problem  of  allocating  resources  and  computation  services  to  support  the  execution 
of  a  distributed  hard  real-time  computation.  Allocation  of  resources  in  'next-generation*  real-time  operating 
systems  requires  some  important  features  in  addition  to  those  demonstrated  by  current  systems,  resulting 
in  an  increased  complexity  of  each  system.  The  new  systems  must  provide  for  general  distribution  issues, 
like  deadlock  prevention,  along  with  supporting  adaptive  behavior  requirements.  In  addition,  there  are 
requirements  to  support  hard  real-time  goals  and  fault  tolerance  objectives.  These  goals  and  objectives  must 
be  guaranteed  to  be  satisfied  even  under  specified  environment  changes. 

In  real-time  operating  systems  the  resource  allocation  has  to  be  related  closely  to  the  scheduling,  and 
the  two  are  based  on  time  considerations.  Scheduling  is  the  mechanism  through  which  the  timing  properties 
of  an  execution  instance  of  a  software  module  are  finalised.  The  allocation  must  also  be  fault  tolerance 
motivated,  to  cope  with  the  reliability  goals  of  an  application. 

The  allocation  scheme  we  propose  here  supports  fault  tolerance  goals  in  both  damage  containment  and 
resiliency  requirements.  In  addition,  this  allocation  scheme  accomplishes  a  deadline  satisfication  guarantee 
for  all  its  accepted  jobs.  It  does  this  in  cooperation  with  a  schedulability  verification  mechanism,  and  with 
an  object  architecture,  in  which  for  each  object  there  exists  a  calendar  that  relates  time  to  its  execution. 

The  paper  is  organised  as  follows.  In  the  remaining  of  this  section  we  review  briefly  the  object  architecture 
and  the  schedulability  verification  ([1,7,8])  that  support  hard  real-time  environments.  We  also  introduce  some 
tools  that  help  us  deal  with  the  allocation  problem.  In  the  following  section  we  formulate  the  problem  and 
the  conditions  for  a  solution.  We  then  introduce  an  algorithm  that  implements  the  above  solution,  and 
finally  we  investigate  some  of  its  properties.  The  paper  closes  with  some  concluding  remarks. 

1.1  Objects  Architecture 

In  [1,7]  we  have  introduced  an  architecture  for  designing  hard1  real-time  operating  systems.  This  architec¬ 
ture  is  based  on  the  use  of  highly  encapsulated  entities,  called  objects.  An  object  is  a  distinct  and  selectively 
accessible  software  element  that  resides  on  one  of  the  storage  resources  of  the  system.  The  objects  archi¬ 
tecture  defines  the  objects  as  the  elements  that  constitute  the  system.  It  also  defines  their  classification, 
the  relationships  between  them,  the  set  of  operations  they  are  subjected  to  and  execution  parameters  that 
permit  scheduling  them  for  execution  and  access. 

In  software  engineering  context,  an  object  is  viewed  as  an  entity  whose  behavior  is  characterised  by  the 
operations  it  is  subjected  to  and  the  operations  it  carries  out  on  other  objects.  The  external  view  of  an 
object  (these  operations)  is  its  specification,  and  the  internal  view  of  an  object  is  its  implementation.  In  our 
architecture,  we  have  considered  the  use  of  object  architecture  in  a  system  context,  thereby  expanding  the 
above  object  definitions  to  describe  elements  and  entities  in  a  more  general  way.  Yet,  some  of  the  properties 
that  characterise  objects  in  software  development  context  are  valid  in  the  system  architecture  context  as 
well.  For  example: 

•  An  object  has  a  state. 

•  There  is  a  set  of  actions  to  which  an  object  is  subjected  and  a  set  of  actions  it  requires  from  other 
objects. 

‘Herd  real-time  eyetemi  are  characterised  by  their  property  of  haring  a  nonrecoverable  fault  when  a  computation  does  not 
complete  before  its  deadline. 
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•  It  is  denoted  by  a  name. 

•  It  has  restricted  visibility  of  (as  well  as  by)  other  objects. 

We  have  shown  that  our  object-oriented  system  design  methodology  provides  means  to  construct  systems 
with  a  high  degree  of  deterministic  and  predictable  timing  properties  {l,7j.  This  determinism,  together  with 
the  required  fault  tolerance  schemes,  are  major  principles  in  our  time-constraint  oriented  system.  We  hi>we 
defined  a  classification  of  object  types,  the  set  of  operations  each  of  the  object  types  is  associated  with,  and 
their  relationships.  A  conceptual  model  has  been  considered  in  our  analysis  of  the  applicability  of  objects 
architecture  for  a  real-time,  distributed,  and  fault  tolerant  operating  system.  Issues  of  creation,  deletion  and 
access  for  manipulation  and  state  verification,  have  lead  us  to  define  the  joint  that  consists  of  the  following 
parts: 

•  A  context  independent  pointer  to  the  object’s  body,  enabling  the  naming  network  to  support  a  multi¬ 
user,  selective  sharing  of  the  object. 

•  An  owner/user  justification  structure. 

•  Resource  (and/or  server)  requirements. 

•  A  ticket  check  mechanism  for  the  protection  scheme. 

•  A  time  constraint  for  an  executable  object. 

•  A  replica/alternative  control  mechanism  for  the  fault  tolerance  scheme. 

In  our  model,  objects  that  relate  to  each  other  are  connected  via  the  owner/user  justifications  in  the 
joint.  These  relationships  are  in  accordance  with  the  visibility  restrictions  and  the  set  of  operations  to  whom 
the  justificand  is  subjected  and  the  justifier  operates  on.  Operations  in  this  set  can  change  an  object’s  state, 
evaluate  current  state  of  an  object,  and  allow  visiting  parts  of  an  object.  These  operations  can  be  carried  out 
on  object  bodies  as  well  as  on  object  joints.  We  can  model  a  system  as  a  graph  whose  nodes  are  objects  and 
whose  arcs  are  directed  from  justifier  to  justificand  representing  the  owner/user  relationship.  Relationships 
between  objects  necessitate  the  grouping  of  objects  of  the  same  type  into  a  meta-object,  to  which  the  rest  of 
the  objects  may  refer  to  as  a  whole  entity. 

Scheduling  executable  objects  and  context  initialisation  are  divided  in  our  model  into  an  on-line  part  and 
an  off-line  part.  The  context  initialiser  consists  of  an  off-line  allocator /binder  that  manages  the  acceptance 
of  jobs  (requests  to  execute  objects)  and  allocates  the  resources  before  loading,  and  an  on-line  loader  that 
activates  schedulable  objects  that  are  invoked.  The  scheduling  policy  is  managed  by  the  off-line  scheduler, 
which  is  responsible  for  recognising  the  availability  of  resources,  and  provides  scheduling  feasibility  verifica¬ 
tion  testing  and  reservation  facilities.  The  on-line  scheduler  carries  out  locally  the  policy,  the  dispatching 
and  the  preemption  of  loaded  executable  objects  (processes)  according  to  their  time  constraints. 

The  way  in  which  the  above  allocator  works  is  the  major  concern  of  this  paper. 

1.2  Guarantees  in  Hard  Real-Time  Systems 

When  a  request  for  a  specific  object  invocation  arrives  at  a  hard  real-time  reactive  operating  system,  the 
operating  system  has  to  allocate  (if  feasible)  all  the  resources  required  such  that  it  is  guaranteed  that  the 


object's  time  constraint  is  met.  Informally  speaking,  a  time  constraint  is  a  requirement  to  start  executing  a 
particular  executable  object,  after  a  condition  is  satisfied,  and  complete  the  execution  before  its  deadline.  The 
execution  time  of  the  object  is  assumed  to  be  given,  and  the  constraint  is  extended  to  a  periodic  execution 
of  the  object.  Based  on  previous  works  that  define  hard  real-time  systems  (e.g.,  [13,14]),  we  have  defined  a 
time  constraint  formally  in  [1,7]  as  the  quintuple 

<  Id,Taft(conditionx),cj4,  fjd,Tbef  (condition?)  > 


where: 

Id  is  the  name  of  the  executable  object  (process)  in  the  proper  context, 

Taft(conditian\)  states  after  what  event  should  execution  begin, 
c/a  is  the  computation  time  of  object  Id, 

fn  is  the  frequency  with  which  the  computation  should  be  carried  out, 

The  f  (condition?)  states  the  deadline  which  should  be  met. 

The  time  interval  defined  by  Taft(conditionx )  and  Tbef  (condition?)  is  the  occurrence  interval,  which  de¬ 
limits  the  time  domain  in  which  the  executable  object  is  allowed  to  execute.  In  an  interval-based  notation, 
as  we  have  used  in  [8],  the  occurrence  interval  and  the  computation  interval  relate  to  each  other  such  that 
the  above  quintuple  is  supported.  The  occurrence  interval  is  a  convex  (contiguous)  interval,  and  the  com¬ 
putation  interval  can  be  non-convex  (since  it  may  contain  gaps).  Let  the  j'th  occurrence  of  time  constraint 
»'  be  denoted  as  the  convex  interval  TCjJ\  and  let  be  a  non-convex  interval  that  represents  the  union 
of  all  possible  execution  traces  of  this  computation.  Then, 

•  Taft(condition\)  — ♦  beginmin(TCjJ^). 

•  Tbef  (condition?)  — ►  <ndmax(TC- j)). 

•  U  periodicity  — ►  V;  >  1 :  end^fTC^*)  -  endmax(TC,-,_l))  =  j-. 

•  Ci  computation  time  — ►  Vj  >  1 :  ||P/,*||  =  Ci. 

A  real-time  operating  system  must  use  the  time  constraint  as  the  key  to  its  decisions  on  execution 
initiation  and  resource  scheduling  ([13]).  Before  execution  initiation,  the  allocation  and  context  initialisation 
are  required  to  ensure  schedulability  of  an  accepted  job.  In  other  words,  before  loading  an  object,  a  positive 
feasibility  result  must  show  that  there  exists  a  schedule  that  includes  the  invoked  object,  according  to  its 
time  constraint,  with  no  conflict  with  the  already  accepted  objects.  All  the  required  resources  should  be 
allocated  and  reserved  for  the  object,  assuring  that  it  is  going  to  meet  its  deadline.  It  is  not  only  processors 
that  are  to  be  allocated  for  an  object.  An  object  may  rely  on  other  server  objects  in  order  to  perform  its 
functions.  These  other  objects  may  or  may  not  reside  at  the  same  site.  Remote  services  necessitate  the 
needs  for  agents  and  for  communication,  each  of  which  has  to  1*  schedulable  within  the  time  constraints  of 
the  invoked  object. 

One  must  take  into  account  the  time  constraints  that  are  projected  between  different  computation  local¬ 
ities,  such  that  each  computation  locality  might  have  access  to  a  different  clock  with  a  different  accuracy 
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Figure  1:  Object’s  Owner  and  User  Temporal  Justification 

and  correctness.  This  projection  is  discussed  in  details  in  [8j.  A  chosen  (and  loaded)  allocation  should  avoid 
conflicts  when  users  share  a  server  object,  with  respect  to  violations  of  the  user  objects’  time  constraints. 
Each  server  object  is  then  considered  as  a  resource,  and  maintains  its  own  schedule.  When  a  server  is  allo¬ 
cated  to  a  new  user,  the  binder  updates  the  justification  links.  The  allocated  server’s  future  schedule  is  to 
be  checked  to  show  schedulability  within  the  new  user’s  time  constraints,  without  violating  the  services  this 
server  has  already  guaranteed  to  serve. 

1.3  Scheduling  Feasibility  Verification 

In  (8)  we  have  introduced  formulation  and  algorithms  that  verify  the  feasibility  of  scheduling  an  incoming 
execution  request,  while  maintaining  and  scheduling  the  requests  which  have  already  been  accepted  before. 
These  algorithms  are  based  on  the  architecture  described  above,  and  the  mechanisms  that  support  the  above 
algorithms. 

Figure  1,  describes  a  temporal  justification  scheme  which  is  used  in  our  architecture.  The  same  server 
object  is  allocated  to  different  users  at  different  times,  hence  creating  a  user  justification  for  this  object  at 
different  time  intervals.  These  intervals  are  in  the  future  and  according  to  them  real-time  scheduling  decisions 
are  to  be  taken.  A  very  important  issue  arises  in  the  above  justification  scheme.  The  time  according  to 
which  the  decisions  are  taken  is  a  local  and  imprecise  view  of  the  global  time.  Distributed  computations 
may  have  the  same  local  view  at  different  nodes  at  different  “real"  times.  Therefore,  future  projection  of 
time  has  to  avoid  ambiguities  and  conflicts  that  originate  in  differences  between  local  views. 

Upon  arrival,  the  time  constraint  of  each  incoming  request  is  properly  projected,  and  tested  for  a  possible 
insertion  into  the  required  object’s  calendar.  The  test  for  schedule  feasibility  depends  on  the  scheduling 
policy  employed.  Conditions  for  schedule  feasibility  and  algorithms  for  both  preemptive  and  non-preemptive 
policies  are  introduced  in  [8].  In  these  algorithms,  if  there  is  a  feasible  schedule,  then  the  time  constraint  of 
the  incoming  request  is  inserted  in  the  calendar,  reserving  the  computation  interval  for  it  in  order  to  avoid 
ambiguity  of  answers  to  requests  in  contention.  The  space  is  reserved  in  a  way  that  ensures  that  the  required 
object  will  not  be  activated  by  the  scheduler,  unless  an  acknowledgement  is  sent  by  the  initiator  of  this 
request.  Furthermore,  the  reservation  is  kept  for  a  limited  time,  and  after  this  “timeout*  elapses  without 


initiator  acknowledgement,  this  request  is  removed  from  the  calendar.  On  the  other  hand,  if  the  test  is 
negative,  i.e.  there  is  no  feasible  schedule  that  does  not  have  a  conflict  witv  already  guaranteed  acceptences, 
then  a  negative  answer  should  be  given  to  the  initiator  of  the  incoming  constraint.  The  initiator,  in  turn, 
should  remove  other  requirements  that  have  already  been  reserved  at  other  calendars  (if  any),  in  case  this 
negative  answer  prevents  its  execution. 

The  allocation  model  that  is  presented  in  this  paper  assumes  the  existence  of  the  above  mechanism.  It  is 
shown  in  [6]  that  this  mechanism  allows  guaranteeing  deadline  satisfaction.  In  addition,  it  is  shown  later  in 
this  paper  that  this  mechanism  not  only  supports  mutual  exclusion  from  the  contention  point  of  view,  but 
it  also  prevents  deadlocks  that  might  arise  from  some  cyclic  dependency  in  the  global  computation  graph. 

2  Problem  Definition  and  Formulation 

The  problem  of  allocating  the  execution  of  computation  elements  to  computation  resources  has  been  studied 
with  respect  to  many  dimensions  of  that  problem.  In  most  of  the  cases  we  have  found  in  the  literature, 
the  goal  of  the  allocation  has  been  an  optimisation  of  some  metric  of  the  execution  performance,  generally 
one  stochastic  parameter  of  the  performence  description.  The  model  which  has  been  mostly  used  in  the 
above  cases,  reflected  an  allocation  of  processes  to  processors,  while  both  the  set  of  processes  and  the  set  of 
processors  have  been  subjected  to  some  inter-set  relations  and  intra-set  optimality  constraints.  In  addition,  in 
most  of  the  above  cases  the  nature  of  each  of  the  processors  was  homogeneous,  indivisible,  and  self  contained. 

We  start  this  section  with  a  review  of  some  recent  important  works  that  have  to  do  with  allocation  of 
real-time  computation  elements  ~nder  high  reliability  requirements.  We  then  introduce  our  computation 
model  and  our  allocation  goals,  and  define  the  requirements  and  the  conditions  for  these  goals  to  be  met.  A 
brief  review  of  the  graph  properties  we  use  later  is  given  for  the  reader’s  convenience. 

2.1  Review  of  Some  Allocation  Approaches 

2.1.1  Allocation  with  IPC  Minimisation 

A  centrally  controlled  allocation  scheme  is  described  in  [10,11],  a  scheme  which  has  been  used  in  the  BMD 
project.  There,  a  nominated  computation  node  has  the  knowledge  of  the  global  status  of  the  system,  and 
each  request  for  task  execution  passes  this  nominated  node  to  be  properly  allocated  to  resources.  The  model 
uses  only  tasks  and  processors.  The  relations  between  tasks  and  processors  are  given  in  matrices,  according 
to  which  the  allocation  is  done. 

•  Task  Preference  Matrix  P  where  p[\,  j]  =  0  means  that  task  t  is  not  allowed  to  execute  on  processor  j. 

•  Task  Exclusion  Matrix  X  where  x[t,  j\  ^  0  means  that  task  i  and  task  j  cannot  be  assigned  to  the 
same  processor. 

•  Task  Coupling  Matrix  C  where  c[t,j]  are  the  coupling  factors  that  represent  the  amount  of  data 
transferred  from  task  *  to  task  j. 

•  Task  Distance  Matrix  D  where  d[t,j]  represent  the  cost  of  transferring  one  data  unit  from  task  i  to 
task  j. 


The  allocation  is  considered  as  a  search  tree.  In  this  tree,  each  vertex  is  a  task  to  be  allocated,  and  each 
arc  that  leaves  the  vertex  is  a  possible  allocation  of  a  processor  to  that  task.  The  search  algorithm  is  based 
on  the  branch  and  bound  method,  and  is  constructed  in  setting  and  backtracking  phases.  The  search  goal  is 
an  allocation  that  minimises  the  execution  time  of  a  “port-to-port”  thread  of  executing  tasks.  The  execution 
cost  function  consists  of  the  following  ([10]): 

1.  Execution  time  of  the  task  on  the  processor,  which  depends  on  the  task  sise  and  the  processor  MIPs 
rate: 

E-  =  nxe(t<uk ») 

'  processor  MIPa  rate 

2.  The  network  and  operating  system  overhead  (Oe),  which  is  used  for  concurrency  control,  integrity 
checking,  recovery  check-point  update,  etc. 

3.  Inter-processor  communication  (/PC),  which  is  higher  if  communicants  reside  on  different  processors. 

4.  Waiting  time  (  WT)  which  is  consumed  when  the  task  waits  in  the  processor  enablement  queue.  This 
figure  depends  highly  on  the  sises  and  number  of  tasks,  the  processor  load,  and  the  number  of  enable¬ 
ments.  (Especially  if  large  tasks  are  assigned  to  the  same  processor.) 

The  search  algorithm  ([11])  eliminates  search  in  improper  subtrees,  while  branching  to  a  new  subtree  ac¬ 
cording  to  matrix  P,  matrix  X,  and  the  maximal  capacity  of  each  assigned  processor.  Preference  is  imposed 
with  dominance  relations,  Ov,  and  WT.  The  I  PC  cost  is  computed  for  each  subtree  through  matrices  C 
and  D,  in  £c[t,j|  d[i,  j\.  The  lowest  cost  solution  is  chosen  out  of  the  set  of  possible  solutions. 

The  major  disadvantages  of  this  algorithm  are  its  centralistic  nature  and  the  requirements  for  global 
knowledge.  Furthermore,  no  time  constraints  are  taken  into  account  at  a  tasi-  level,  and  no  fault  toler¬ 
ance  goals  are  set.  The  above  disadvantages  indicate  that  this  allocation  scheme  is  not  suitable  for  “next- 
generation”  real-time  operating  systems. 

2.1.2  Allocation  with  Bottleneck  Processor  Load  Minimisation 

In  [2],  an  objective  function  is  suggested  for  the  problem  of  allocation  of  tasks  to  processors,  using  an 
optimisation  constraint  of  minimising  the  load  on  the  bottleneck  processor,  i.e.  the  most  heavily  loaded 
processor  The  algorithm  is  presented  in  section  B.l. 

The  algorithm  assumes  that  the  load  on  a  processor  is  a  function  of  the  inter-module  communication 
( IMC ),  the  accumulative  execution  time  (AET)  of  the  modules,  and  the  precedence  relations  (PR)  between 
executing  modules.  It  defines  the  problem  on  a  set  of  J  modules,  pi,...,pj,  and  a  set  S  processors.  The 
AET  of  module  py  during  a  particular  time  interval  can  be  derived  from  the  number  of  times  p:  executes 
during  this  interval,  and  the  average  execution  time  of  py  over  peak  load  periods.  The  AETt  of  the  modules 
are  assumed  to  be  known,  and  are  denoted  as  (Ty  :  1  <  j  <  J}.  IMC  in  this  approach  incurs  the  inter¬ 
process  communication  cost  ( IPC )  and  the  processing  overhead.  IPC  can  be  significantly  reduced  if  the 
allocation  assigns  pairs  of  heavily  communicating  processes  to  the  same  processor.  The  workload  on  a  given 
processor  (Pr),  under  a  given  assignment  of  the  J  modules  to  the  S  processors,  is  defined  as 

£(Pr;  X)  =  AET(Pt-,  X)  +  IMC(Pr\  X) 

where  T  is  an  assignment  matrix  [xt,y],  for  which  z,  y  =  1  if  p<  is  assigned  to  processor  j. 
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The  assumptions  taken  in  [2]  on  IPC,  allow  the  selection  of  a  model  of  communication  cost  as  a  sum  of 
the  coat  of  outgoing  messages  and  the  cost  of  incoming  messages,  whereas  the  costs  of  module  enablements 
and  control  messages  are  ignored. 

IMCM  =  IPC(i,  j;  X)  +  IPC(j,i;X). 

Given  the  average  inter-module  communication  cost  at  peak  load  periods,  :  1  <  i,j  <  J),  which 

can  be  calculated  from  the  volume  of  the  communication  between  the  modules,  one  can  derive  the  IMC 

..  IMC.  , 

Thus,  the  load  at  r  can  be  expressed  as 

J  s  s 

=  X>,.rr,  +  52  IPC{r,3-,X)  +  £  /PC(s,r;X). 

j-l  t=l*r  *  =  l*r 

The  bottleneck  processor  load  is 


bottleneck! X)  =  ^max^i^-P,-;  X)} 


and  minimising  this  load  is 


mia{bottleneck(X)} 

or 

mm{m^s{AET(Pr-,  X)  +  IMC{Pr ;  X)}}. 

Precedence  relations  {PR)  affect  the  response  time  of  the  system,  and  this  aspect  is  included  in  this 
algorithm.  In  [2],  a  model  of  wait-time  behavior  is  constructed,  based  on  the  observed  relation  between  size 
ratio  of  modules,  pij,  and  wait-time  ratio,  R(pi%}).  The  algorithm  then  uses  the  PR  index 

j)  =  1  - 

The  algorithm  in  section  B.I  presents  an  iterative  approach  in  which  the  workload  Z{Pbottieneck',  X)  is 
recorded  for  different  tuning  scale  factors  a  and  P-  a  represents  the  scale  factor  of  combining  'iimc  with 
~Ipr,  and  P  is  a  scale  factor  for  the  threshold  of  processor  load  on  which  combining  is  decided. 

The  P-I-A  algorithm  disregards  loads  on  processors  due  to  other  computations,  rather  than  p i, . . . ,  pj, 
and  therefore  in  order  to  allow  independent  computations  it  requires  am  extra  knowledge.  Being  centralists 
itself,  implies  that  this  global  knowledge  must  be  centralistic  or  static.  Two  major  properties  of  hard  real¬ 
time  systems  are  not  dealt  here,  since  no  time  constraints  are  imposed  on  module  execution,  and  no  fault 
tolerance  objectives  are  defined. 


2.1.3  NEXT-FIT-M  Partitioning  for  Rate  Monotonic  Schedulers 

In  [4] ,  an  on-line  algorithm  of  O(n)  time  complexity  and  O(l)  space  complexity  is  introduced,  to  partition 
a  set  of  tasks  such  that  each  partition  will  be  scheduled  later  for  execution  at  a  distinct  locality  by  a  rate- 
monotonic  priority  scheduling  algorithm.  The  allocation  is  centrrlistic,  while  the  scheduling  is  distributed. 
A  subgoal  of  the  algorithm  is  to  use  as  few  processors  as  possible. 

In  this  model,  tasks  have  the  following  characteristics: 


•  Each  task  has  a  constant  period. 

•  Each  task  has  a  deadline  constraint,  and  no  begin-time  constraint  is  imposed. 

•  Tasks  are  independent  of  each  other,  without  precedence  constraints. 

•  All  the  tasks  require  the  same  computation  time-interval. 

The  tasks  are  partitioned  according  to  their  duty  cycle,  which  is  the  ratio  between  the  identical  com¬ 
putation  interval  and  the  task’s  period.  They  are  then  assigned  to  processors  such  that  the  processors  will 
schedule  them  in  a  rate  monotonic  scheduling  algorithm.  Each  of  the  rate  monotonic  algorithms  is  known 
to  be  bounded  ([9]),  and  therefore  the  allocation  maintains  the  load  allocated  to  each  processor  such  that  it 
does  not  exceed  that  bound. 

The  allocation  algorithm  is  described  in  section  B.2.  The  tasks  are  divided  into  M  classes,  such  that 

•  task  Ti  €  class  Jc  if  2S^T  <  <  2^  for  1  <  A:  <  M. 

•  task  Ti  €  class. M  if  o  <  tq  <  2^. 

The  algorithm  assigns  k  claas-fc  tasks  to  each  class- A:  processor,  keeping  the  utilization  factor  of  the  class- M 
processor  less  than  In  (2). 

The  partioning  mechanism  of  this  allocation  algorithm  is  based  on  the  use  of  local  rate-monotonic  priority 
schedulers,  and  it  is  therefore  totally  scheduler  dependent.  Even  so,  the  model  of  the  above  scheduler  is 
too  simple  to  support  "new-generation*  real-time  applications.  The  absence  of  begin-time  constraints,  the 
lack  of  support  for  a  variety  of  computation  requirements,  the  absence  of  important  relations  between  tasks 
(precedence  and  others),  and  disregarding  loads  on  processors  due  to  other  computations,  are  features  that 
this  approach  fails  to  demonstrate.  Furthermore,  it  tails  to  support  any  fault  tolerance  goals,  and  thus 
does  not  give  a  comprehensive  solution.  However,  we  find  the  relationship  demonstrated  in  [4]  between  an 
allocator  and  local  schedulers  important  and  useful. 


2.1.4  Allocation  with  Load  Balance  Optimality  Constraint 


In  [6,5],  allocation  of  processes  to  processors  is  examined  with  respect  to  a  distributed  load  balancing 
optimality  constraint,  and  groups  of  processes  are  relocated  when  one  or  more  processes  fail. 

A  set  of  processes  Pp  =  (pii...,pj}  are  related  to  each  other  through  a  set  of  logical  links  £p,  to  form 
a  graph 

9p  =  (^pi  £p)- 


A  set  of  processors  Pp  =  {Plt . . . ,  Ps}  are  related  to  each  other  through  a  set  of  physical  links  Lp,  to  form 
a  graph 

9p  =  (Pp,£p)- 

Each  node  in  the  above  two  graphs  can  be  measured  according  to  its  incoming  and  outgoing  links,  apply¬ 
ing  some  weights  to  the  links  to  express  communication  costs.  These  measures  can  be  used  as  similarity 
(clustering)  measures,  according  to  which  each  of  the  graphs  is  represented  by  a  cluster  tree,  rp  and  rp, 
respectively.  The  allocation  algorithm  in  [6]  is  mapping  the  nodes  in  rp  to  the  nodes  of  rp.  The  motivation 
of  this  allocation  is  to  assign  heavily  communicating  processes  to  heavily  connected  processors. 
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In  this  approach,  all  the  processes  are  assumed  to  be  roughly  equal  in  the  load  they  impose  when  assigned 
to  a  processor,  and  this  load  is  assumed  to  be  a  unit  load.  Each  processor  Pi  is  assumed  to  have  a  current 
assigned  load  denoted  e,-.  The  current  load  is  bounded  by  the  processor  capacity  C{,  and  is  required  to 
satisfy  an  optimality  workload  constraint 

where  A  is  the  optimal  load  and  e  is  a  tolerance.  The  relation 


mi  <  Ci  <  Mi 


is  another  way  to  express  the  optimality  constraint,  where 

m,  =  Ci  ■  (A  —  e),  Mi  =  C,-  (A  +  e). 

When  a  cluster  of  processors  is  observed,  the  sum  of  its  processors’  capacities  expresses  the  cluster’s 
capacity,  and  a  sum  of  the  currently  assigned  processor  loads  is  the  currently  cluster  load.  A  metric  that 
represents  the  violation  of  optimality  in  cluster  j  can  be  expressed  as 

Vi  ~  -  A|  -e. 

The  ALLOCATE  algorithm  presented  in  (6|  uses  the  violation  values  of  the  children  of  a  node  in  processor 
cluster  tree  in  order  to  select  a  candidate  cluster  of  processors  to  which  processes  are  to  be  assigned.  The 
highest  violation  is  selected  first. 

In  order  to  support  fault  tolerance  objectives,  the  occurrence  of  a  fault  must  first  be  detected.  A  cluster 
of  processors  that  monitor  each  others  status  and  participate  in  the  detection  algorithm  are  called  a  detection 
unit  The  set  of  5  processor  is  therefore  divided  to  detection  units  D\, . . . ,  Dk,  and  each  detection  unit  D, 
is  assigned  with  Ni  processes. 

Each  detection  unit  is  ordered,  to  have  a  Leader,  second  in  command,  etc.  The  assumption  that  no 
failure  occurs  while  recovering  from  a  previous  failure,  allows  replacing  a  Leader  that  has  failed  using  a 
simple  protocol.  Each  Leader  maintains  some  knowledge  in  order  to  answer  questions  of  other  Leaders  that 
cannot  relocate  in  their  own  detection  unit.  Each  Leader  maintains  additional  information  for  its  relocation 
management,  both  for  relocating  locally  (within  the  detection  unit)  and  for  relocating  externally  (moving 
processes  to  another  detection  unit). 

When  a  processor  Pj  is  detected  to  have  failed  in  detection  unit  Di,  its  capacity  is  removed  from  the 
total  capacity  of  the  detection  unit.  The  Leader  of  Z?,  checks  if  the  fault  can  be  dealt  with  locally,  by  a  local 
reconfiguration  of  the  allocation  within  Di.  If  this  is  the  case,  then 


m(Di)  —  m,  <  Ni  <  M (A)  -  M, 

and  the  actions  taken  are  described  in  section  B.3.1.  The  benefits  of  a  local' relocation  are  the  isolation  of 
failures  from  other  detection  units  and  the  minimisation  of  enforcement  of  departing  from  optimality. 

However,  if  it  cannot  be  treated  locally,  the  Leader  must  generate  a  candidate  set,  of  (t>,  k)  pairs,  to  select 
both  the  node  v  to  be  migrated,  and  the  destination  detection  unit  Dk-  For  each  node  v  to  be  migrated  to 
detection  unit  Dk,  three  cost  issues  aie  raised: 
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1.  migration  coat,  M(v,k),  which  mainly  consists  of  a  fix  overhead,  and  a  cost  which  is  proportional  to 
the  nnmber  of  leaf  nodes  that  are  descendants  of  v, 

2.  affinity  cost,  A(u),  which  originates  in  the  increased  logical  communication  between  A  and  the  mi¬ 
gration  destination,  and  therefore  depends  on  the  links  of  the  migrated  node, 

3.  utilization  cost,  B(v,  k),  which  is  a  measure  of  the  unbalanced  load  and  the  violation  of  the  optimality 
measure  in  A  and  in  the  destination  detection  unit. 

Combining  the  above  three  costs  to  a  ranking  measure,  yields 

R(v,  k)  =  Ci  ■  A(v)  +  Ci  ■  B(v,  k)  +  C3  M(v,  *). 

An  algorithm  which  is  motivated  by  the  above  ranking  is  described  in  section  B.3.2. 

We  find  the  relocation  approach  very  appealing  from  fault  tolerance  point  of  view.  However,  this  approach 
does  not  satisfy  hard  real-time  system  requirements,  because  it  does  not  take  into  account  the  deadlines  in 
its  clustering  measure  (e.g.  in  ALLOCATE)  and  the  impossibility  to  recover  through  a  roll-back  (e.g.  in  its 
migration  solution)  in  many  cases. 


2.1.5  Heuristic  Approaches 

In  general,  the  mapping  of  timing  constraints  plus  the  precedence  relations  onto  resource  allocation  in  a 
multi-processor  environment  is  an  NP-hard  problem  ([13,14]).  This  fact  motivates  the  research  for  heuristics 
that  provide  sub-optimal  solutions  for  the  hard  real-time  allocation  and  scheduling  problems.  Some  heuristic 
approaches  taken  in  scheduling  ([17,18]),  suggest  some  interesting  ideas  with  respect  to  the  allocation  scheme. 
An  example  of  a  heuristic  scheduler,  the  one  used  in  the  Spring  operating  system,  is  given  in  section  B.4. 

In  [17],  at  each  level  of  the  search  tree,  the  scheduler  updates  a  vector  of  the  Dynamic  Resource  Demand 
Ratio 

DRDR  =  (DRDRi . DRDRi . DRDRr) 

whose  component  *i*  indicates  the  fraction  of  resource  R,  to  be  used  by  the  tasks  not  yet  scheduled. 


DRDRi  = 


2r(cT  :  r  remains  to  be  scheduled  f\T  uses  Ri) 
maxT(dr  :  T  remains  to  be  scheduled  uses  R\)  —  EAT i 


where  EATi  is  the  earliest  available  time  of  resource  Ri,  and  dr  and  c t  are  the  deadline  of  task  T  and 
its  computation  time  respectively.  One  should  notice  that  all  the  resources  are  reserved  for  the  whole 
computation  time.  When  a  search  decision  is  to  be  taken  regarding  the  schedule  feasibility,  as  in 


if  strongly-feasible(task jet,  schedule)  then  . . . 

one  should  check  also 

V,=  1 . r« :  DRDRi  <  1- 

In  [18],  the  scheduler  allows  preemption  and  thus  each  resource  is  allowed  to  be  required  in  one  of  the 
following  three  modes: 


•  exclusive, 
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•  shared, 

•  not  needed. 

At  each  level  of  the  search  tree,  the  scheduler  updates  another  vector  of  the  Minimum  Resource  Demand 
Ratio 


MRDR  =  ( MRDRU ....  MRDRr) 

whose  component  ‘i*  indicates  the  fraction  of  resource  R±  to  be  used  by  the  tasks  not  yet  scheduled. 


2r6.R*"‘**',*(cT  ’•  T  remains  to  be  scheduled )  +  ma xTeR.k„.(cT  :  T  remains  to  be  scheduled) 

maxy  (dy  :  T  uses  Ri  c'T  >  0)  -  EATi 


MRDRi  = 

with  the  terms  defined  as  above,  except  for  cT  which  is  the  remaining  execution  time  of  task  T 


2.2  Model  Description 

Our  model  of  computation  is  a  system  constructed  from  objects  and  resources.  The  objects  that  participate 
in  a  computation  are  related  to  each  other  via  semantic  links  that  are  pointed  by  the  object  joints.  The 
temporal  properties  of  each  relation  are  expressed  as  either  convex  or  non-convex  time  intervals  in  a  calendar 
within  the  relevant  joint.  In  that  respect,  resources  also  can  be  viewed  as  objects.  However,  we  distinguish 
between  the  two  for  differences  in  fault  tolerance  properties  that  are  related  to  monotonicity  of  faults.  The 
distinction  is  also  related  to  properties  that  concern  damage  containment  in  case  of  faults.  The  properties 
of  the  resources  may  allow  us  to  model  the  system  elements  in  terms  of  resource  segments.  For  example,  we 
may  model  one  particular  memory  page  as  a  resource  if  we  can  detect  a  failure  at  this  level  of  resolution, 
and  trigger  an  off-line  recovery  at  the  same  level.  On  the  other  hand,  if  we  cannot  do  the  above,  we  may 
model  the  whole  memory  at  a  given  locality  as  a  resource,  or  even  the  whole  locality  (i.e.  the  processors, 
the  memory,  the  devices,  etc.)  as  a  single  resource. 

Executing  and  "to-be- executed"  objects  are  to  be  allocated  as  system  resources,  each  having  its  own  joint 
and  calendar.  These  resources  are  physically  linked  according  to  their  geographic  and  hardware  constraints. 
However,  in  addition  to  resources,  services  provided  by  objects  may  need  to  be  allocated  to  other  objects. 
Each  of  these  services  may  need  other  services  and  resources,  and  so  on.  We  present  this  as  a  graph,  where 
objects  and  resources  are  represented  as  nodes,  and  the  relations  are  represented  as  directed  arcs.  Note  that 
resources  are  always  the  leaf-nodes,  since  a  resource  is  not  expected  to  need  services  from  other  resources. 

The  distinction  between  transient  and  monotonic  faults,  as  expressed  in  our  object /resource  model,  allows 
the  use  of  two  possible  recovery  mechanisms.  We  denote  the  most  common  one  as  temporal  redundancy ,  in 
which  a  "retry"  effort  is  executed  upon  a  fault  detection.  This  mechanism  is  perfectly  suited  for  faults  whose 
existence  may  be  a  transient  phenomenon.  It  also  permits  roll-back  recovery.  Real-time  constraints  may 
conflict  with  temporal  redundancy,  because  the  time  needed  for  recovery  may  not  exist.  Furthermore,  in 
case  of  a  monotonic  failure,  retrying  is  ineffective.  In  such  cases  only  physical  redundancy  can  increase  the 
system  resiliency.  Roll-forward  recovery  and  the  N-version  programming  are  examples  of  such  redundancy. 

In  Figure  2  we  give  an  example  of  the  two  mechanisms.  Objects  a  and  b  are  allocated  with  temporal 
redundancy,  while  object  e  has  a  physical-redundancy  in  object  d.  In  the  model  defined  below,  resources 
are  to  be  subjected  only  to  physical  redundancy,  while  redundancy  of  objects  is  defined  by  the  computation 
designer. 
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One  major  obstacle  that  the  allocation  and  relocation  mechanisms  most  overcome  is  shown  in  Figure  3. 
Although  objects  B\  and  £3  are  physically  redundant,  and  so  are  objects  Cs  and  C4,  the  allocation  in  the 
figure  results  in  a  O-res ilient  computation.  Any  failure  of  one  of  the  four  resources  result  in  a  computation 
fault,  since  both  redundant  threads  depend  on  all  four  resources.  If  B\  is  allocated  with  Cs,  and  £3  with 
Ct,  the  outcome  is  a  1-resilient  computation. 

2.3  Conditions  and  Formulation 

Let  each  executable  object  instance  p  have  a  set  of  resource  requirements  {£.p*}  and  service  requirements 
{5|fp)},  called  its  dependency  jet,  which  we  denote  as  DSP.  Restricting  p  with  a  time  constraint  TCP 
implies  a  projected  time  constraint  to  each  member  of  its  dependency  set.  Each  projection  is  a  result  of  the 
temporal  relation  between  p's  execution  and  its  requirements.  A  service  requirement  can  be  executed  by 
another  executable  object  instance,  which  can  be  chosen  out  of  a  set  of  alternatives.  Hence,  we  can  define 
the  dependency  set  as  follows. 

Definition  1  The  dependency  set  of  an  object  p  with  a  time  constraint  TCP  is 

DSp,tc,  =  {  {<  £.!p),  TCr(/)  >:  1  $  i  <  k}  ,  {sjp)  :  1  <  t  <  n}  }. 

where 

sjp)  =  {<  «<p)(<),  TC{p){i)  >:  l<j<  A/<p)(<)} 

and  M*p**'*  is  the  number  of  service  alternatives  of  service  requirement  sfpK 

Consider  a  graph  that  models  the  dependency  relations  between  an  object  p  and  its  requirements,  denoting 
each  relation  by  a  directed  arc  from  an  object  to  a  member  of  its  dependency  set.  If  a  member  of  the 
dependency  set  is  another  object  q,  then  q’s  dependency  graph  is  a  sub-graph  of  p’s  dependency  graph. 

Definition  2  The  dependency  graph  of  an  object  is  a  graph  in  which  the  object  is  represented  as  a  node, 
and  directed  arcs  connect  this  node  to  the  dependency  graphs  of  the  members  of  its  dependency  set. 

We  can  also  define  the  set  of  members  in  each  sub-graph  as  follows. 

Definition  S  A  reachability  set  of  an  object  p,-  is  the  set  of  all  objects  pk,  such  that  there  exists  a  finite 
path  from  pi  to  pk  in  the  dependency  graph  of  p* . 

In  [7]  we  have  shown  that  for  non-preemptive  scheduling  discipline  we  require  a  totally-disjoint  (m) 
relation  between  all  the  computations,  as  well  as  avoiding  conflicts  between  windows  of  occurrence  and  their 
corresponding  computation  non-convex  intervals. 

In  the  following  definitions,  we  use  notation  which  we  adopted  in  [7].  ‘Each  convex  interval,  say  A,  is 
assumed  to  be  delimited  by  and  t£.  The  leftmost  convex  sub-interval  of  a  non-convex  interval  B  is 
denoted  as  <B,  and  respectively  the  rightmost  one  by  >£.  w  is  the  interval  cover  operator.  Finally,  the 
interval  containment  property  is  denoted  by  >,  the  interval  intersection  operator  by  n,  and  the  empty  set 
by 
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Definition  4  The  laxity  of  a  ( computation )  n on- convex  interval  Pi  that  is  constrained  within  a  (window) 
convex  interval  TCi  is  defined  by  the  pair  (x*G‘ ,  x^.G' ),  such  that 

i_C‘  =  -  begir^iniPf)  <  &  -  £Ci  =  t“Pl  -  £c\ 


x 


TCi 

+ 


=  cnd^P*)  -  tPf  <  t™‘ 


where  PtL  =  <Pit  PR  =  >P.  □ 


jTC% 

l0  ~zfi  ' 


Defining  the  laxity,  one  conld  phrase  the  condition  for  son-preemptive  schedulability. 


Condition  1  Let  the  incoming  time  constraint  have  an  occurrence  window  TCin  and  a  non-convex  com¬ 
putation  requirement  pn.  Let  V  be  the  verification  interval,  derived  from  the  duration  of  a  time  constraint 
TCX,  for  which  TCX  =  TC,n  or  TCX  ">  TCm,  such  that  1°,  the  set  of  already  accepted  time  constraints  that 
intersect  with  the  verification  window  V ,  satisfies 

flTC\j)  €  1°  :  TC\i]  >  TCX. 

The  incoming  time  constraint  is  non-preemptively  schednlable  if 

VtVj  :  ref  >  e  1°  :  3xPC<’)  >  0,  3**C‘W  >  0, 

3xlc,‘  >  0,  3x+c“  >  0  :  VPf  >  €  P°  :  P.nSpf 
where  P°  is  the  set  {P.(f  |  TC\i]  el°}.  □ 

In  the  preemptive  schedule,  we  now  take  into  account  the  non-convex  nature  of  the  computation  intervals. 

Definition  6  The  set  of  maximal  convex  subintervals  of  convex  time  intervals  A  and  B  is  defined  as 
S({j4},  {B}),  such  that 

•  An  B  -  4>  =>•  5  =  {A, B} 

•  Ar\B^4>  =>  S  =  {AwB}. 

The  set  of  maximal  convex  subintervals  of  a  non-convex  time  interval  D  is  the  set  of  maximal  convex  subin¬ 
tervals  of  all  its  convex  members  {<£}.  □ 

Defining  the  set  of  maximal  convex  subintervals,  one  could  phrase  the  condition  for  preemptive  schedu- 
lability. 

Condition  2  Let  the  incoming  time  constraint  be  a  time  constraint  with  an  occurrence  window  TCin  and  a 
non-convex  computation  requirement  P<n.  Let  V  be  the  verification  interval,  derived  from  the  duration  of  a 
time  constraint  TCX,  for  which  TCX  =  TCi„  or  TCX  TCin,  such  that  for  I ° ,  the  set  of  already  accepted 
time  constraints  that  intersect  with  the  verification  window  V ,  there  exists  no  time  constraint  that  contains 
7CX.  Letlv  be 

Iv  =  I°u{TCin}-{TCx}. 


17 


The  incoming  time  constraint  is  preemptively  schedulable  if 

va,6  5(Jv):  J2  £  ll*.nf||  <  U«.H 

vi-Tc,eiy  yfc:Pi(‘)6Pi 

A 

E  iM^ni-  Y1  n^ifc,ii 

V«.€S(/r) 

where  i,  are  tAe  convex  subintervals  of  S(Iv).  □ 

The  schedulability  conditions,  Conditions  1  and  2,  establish  conditions  for  an  object  allocatability. 

Condition  S  An  object  p  is  allocatable,  if  it  is  schedulable,  its  resource  requirements  are  schedulable,  and 
for  each  of  its  service  requirements  there  is  at  least  one  allocatable  service  alternative  (in  case  the  set  of  its 
service  requirements  is  not  an  empty  set).  □ 

The  above  definition  is  recursive,  implying  that  there  must  exist  at  least  one  object  with  an  empty  service 
requirement  set  in  the  reachability  set  of  each  allocatable  object. 

We  now  define  the  resilience  of  an  allocated  computation  to  transient  faults  and  to  monotonic  faults. 
But  in  order  to  do  so,  we  first  define  two  special  subgraphs  of  an  allocated  dependency  graph. 

Definition  6  An  allocation  graph  of  an  object  is  a  sub-graph  of  the  object's  dependency  graph  in  which  only 
allocatable  objects  and  the  schedulable  resources  are  represented. 

Note  that  when  the  allocation  graph  of  object  p  includes  the  object  p  itself,  it  contains  also  all  the  resource 
requirements  and  all  the  service  requirements  of  p,  due  to  the  allocatability  property. 

Definition  7  An  allocation  alternative  of  an  object  is  a  sub-graph  of  the  object's  allocation  graph  in  which 
for  every  service  requirement  only  one  service  alternative  is  represented. 

Due  to  the  definition  of  the  allocation  graph,  the  service  alternatives  represented  in  the  allocation  alter¬ 
native  are  obviously  allocatable. 

Condition  4  An  allocation  for  the  execution  of  object  p  is  n -resilient  to  monotonic  faults  if  p  is  allocatable, 
and  there  exist  at  least  n  -f  1  distinct  allocation  alternatives  whose  intersection  with  each  other  contains  at 
most  the  node  p.  □ 

It  should  be  emphasised  that  this  condition  does  not  allow  any  resource  requirement  of  p  or  any  service 
requirement  of  p  to  be  contained  in  the  intersection. 

Definition  8  An  allocatable  instance  of  an  allocation  alternative  Ap  of  an  object  p  is  the  the  tuple  (AP,  TCP), 
where  TCP  is  tAe  particular  time  constraint  reserved  for  this  allocation  alternative. 

Using  the  above  definition  and  recalling  that  a  physical  redundancy  is  also  a  temporal  redundancy,  we 
have  the  following  condition. 
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Condition  5  An  allocation  it  n -resilient  to  transient  faults  if  the  computation  is  k- resilient  to  monotonic 
faults,  and  each  of  its  k+  1  allocation  alternatives,  0  <  t  <  k,  have  Ti  distinct  alloeatable  instances,  such  that 

x>>». 

4=0 

The  practical  implication  of  the  above  condition  can  be  stated  informally  in  terms  of  the  following 
allocation  philosophy.  An  allocator  may  be  required  to  achieve  an  objective  of  a  given  resilience  to  transient 
faults,  and  the  number  of  distinct  alloeatable  instances  at  a  given  allocation  alternative  cannot  support  it. 
Then,  allocating  additional  alloeatable  instances  of  another  allocation  alternative  will  be  an  adequate  choice 
for  this  allocator. 

3  Allocation  Algorithm 

In  this  section  we  introduce  our  allocation  algorithm,  based  on  the  definitions  and  conditions  we  have 
introduced  above.  But  before  providing  the  detailed  algorithm,  we  introduce  the  principles  according  to 
which  it  works.  In  addition,  a  condensed  version  of  the  algorithm  is  provided  for  a  better  understanding  of 
the  principles. 

Considering  the  dependency  graph  defined  in  section  2.3  we  call  a  leaf  node,  a  node  whose  service  re¬ 
quirement  set  in  its  dependency  set  is  empty.  Recall  that  in  the  dependency  graph  the  nodes  are  executable 
objects  to  be  allocated.  As  we  will  see  later,  a  leaf  node  plays  a  special  role  in  this  allocation  algorithm,  and 
is  in  charge  of  generating  the  *yesf  answers. 

The  state  of  an  executable  object  during  allocation  can  be  alloeatable  or  non-allocatable.  An  executable 
object  is  alloeatable  when  it  satisfies  the  allocatability  condition  as  specified  in  Condition  3.  Even  when  an 
executable  object  is  non-allocatable,  it  is  assumed  to  be  capable  to  respond  to  the  algorithm  performed  by 
the  allocator. 

We  assume  that  the  allocation  algorithm  is  performed  by  allocators,  each  of  which  is  invoked  to  test  the 
satisfaction  of  Condition  3  by  a  particular  object2.  Therefore, we  start  by  defining  the  invocation  messages 
used  in  the  algorithm,  and  we  go  to  describe  the  principles  of  the  algorithm. 

3.1  Message  Types  Used 

•  ALLOC ATE(from, whom, TC,physicaljedundancy, temporal-redundancy, to)  is  the  initiator  message: 
from  -  initiating  object  Id. 

whom  -  set  of  alternative  object-SAPs  to  be  allocated. 

TC  -  time  constraint. 

physical, redundancy  -  degree  of  physical  redundancy. 
temporal-redundancy  -  degree  of  temporal  redundancy. 
to  -  receiving  allocator  Id. 


aTh«  assumption  is  not  restricting  the  generality  of  the  algorithm,  but  rather  enriches  its  poiaible  implementation*.  Allocators 
can  be  different  instances  of  the  aame  allocator  (e.g.  a  recursive  call),  or  different  allocators  executing  concurrently. 
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•  ALLOCJtEQ(of, tag, from, whom, level, TC, to)  it  the  query  message: 
of  -  initiator  Id. 

tag  -  tag  number  of  this  oft  computation  session. 
from  -  object  JSAP  that  requests  the  service, 
whom  -  object-S  AP  whose  service  is  requested. 
level  -  degree  of  temporal  redundancy  requested. 

TC  -  time  constraint. 
to  -  receiving  allocator  Id. 

•  ALLOCJlEP(color, of, tag, from, whom, bleed,  TC,to)  is  the  feedback  message: 

color  -  yes/no. 
of  -  initiator  Id. 

tag  -  tag  number  of  this  oft  computation  session. 
from  -  object  JSAP  that  replies. 

whom  -  objectjSAP  which  requested  the  service  from  from, 
bleed  -  degree  of  temporal  redundancy  in  debt. 

TC  -  time  constraint. 
to  -  receiver  Id. 


3.2  Principles  of  Algorithm  for  Initiator 

The  following  algorithm  is  implemented  as  an  interface  between  the  user  who  wants  to  initiate  an  allocation 
session  and  the  allocator.  It  can  be  a  special  service-access-point  of  the  allocator,  or  a  dedicated  server  of 
another  type.  When  one  initiates  an  allocation  session,  one  must  specify  its  fault  tolerance  objectives,  its  set 
of  alternatives  in  which  the  computation  can  be  carried  out,  and  the  timing  constraints  for  this  computation. 
The  initiating  algorithm  tries  to  reach  the  fault  tolerance  objectives  by  requesting  allocation  of  computation 
alternatives  (from  the  set  defined  above)  that  adhere  to  the  physical  and  temporal  redundancy  defined 
by  the  user,  as  well  as  to  the  timing  constraints.  Tagging  the  alternatives  allows  concurrent  allocation  of 
dependency  graphs  while  maintaining  null  intersection  between  these  graphs,  as  long  as  the  computation  Id 
and  the  graph  tag  are  spread  with  the  requests  throughout  the  graph. 

Therefore,  the  initiator  (me)  must  send  enough  ALLOCJREQf. . .)  messages  to  allocate  members  of  the 
alternative  set  defined  by  *w horn” ,  and  me  now  has  to  wait  for  the  answers.  In  order  to  have  a  higher  degree 
of  concurrency,  an  artificial  object-joint  is  created  instead  of  keeping  me  active  while  waiting  (recalling  that 
me  can  be  an  allocator),  to  collect  the  answers  when  they  arrive,  and  to  allow  choosing  another  alternative 
when  the  answer  is  negative. 

Decrease  of  physical  redundancy  is  implicitly  prevented  by  the  algorithm.  The  physical  redundancy  is 
controlled  through  the  INSERT .TC  function  (see  section  C.l)  that  does  net  reserve  in  a  particular  calendar 
two  requests  with  the  same  Id  and  different  tage.  This  property  adheres  to  the  null  intersection  requirement 
in  Condition  4. 

•  Upon  receiving  ALLOCATE(from,whom,TC,  phyiicai_redundancy,temporal^redundancy,me):: 
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1.  Create  an  object  (ROOT)  whose  dependency  set  consists  of  an  empty  set  of  resource  requirements, 
and  a  set  of  service  requirements  whose  cardinality  equals  the  physical  redundancy  level  required 
+  1.  Distribute  the  alternatives  of  whom  into  these  service  requirements. 

2.  For  every  service  requirement  in  ROOT  do: 

-  Select  the  first  service  alternative  in  the  service  requirement. 

-  Send  ALLOCJtEQ  for  allocating  the  selected  service  alternative,  distinguishing  each  service 
requirement  with  a  different  tag.  The  ALLOC  JREQ  asks  for  the  temporal  redundancy  re¬ 
quired,  imposes  the  requested  time  constraint,  and  designates  ROOT  as  the  initiator  of  the 
allocation  request. 

3.3  Principles  of  Algorithm  for  Allocator 

The  following  algorithm  is  implemented  in  all  instances  of  an  allocator  object  in  the  system.  It  consists  of 
actions  responding  to  an  ALLOC JtEQ(.  ..,me)  message  (allocation  request),  and  actions  responding  to  an 
ALLOC JIEP(. .  .,me )  message  (allocation  reply). 

An  executable  object  ( whom )  for  which  an  allocator  receives  a  ALLOC JtEQ(. .  .,whom,. .  ,,mt)  message 
must  have  the  schedulability  property  for  itself  and  for  its  resources  for  each  of  its  “to-be-executed*  instances. 
If  it  is  schedulable,  it  forwards  ALLOC JtEQ(. . .)  messages  to  allocate  its  service  requirements  in  its  depen¬ 
dency  set.  This  forward  wave  of  ALLOC -REQ(. . .)  messages  proceeds,  propagating  the  ALLOC JIEQ(. . .) 
messages,  until  a  requesting  message  reaches  either  an  executable  object  which  is  non- schedulable  or  a  leaf 
executable  object  which  has  no  service  requirements. 

The  timing  constraints  sent  in  the  ALLOC  JIEQ(. . .)  messages  to  the  service  requirements  and  the 
timing  requests  imposed  on  the  resource  requirements  are  projections  of  the  incoming  timing  constraint. 
These  projections  are  done  according  to  the  required  temporal  relations  between  the  invoker’s  constraint 
and  those  imposed  on  the  requirements.  We  assume  that  these  relations  are  known  in  advance,  and  that 
they  are  convergent,  as  defined  below. 

Definition  9  A  convergent  temporal  relation  sequence,  it  a  sequence  of  temporal  relations  (Zxyi. . %VI) 
that  satisfies 

xZxyy . . .  Zyxx  <  x  V  xRxy y . . .  Zyxx  =  x  V  xZxyy . . .  Zyxx  |  x  V  xkxyy . . .  Zyxx  1  x 
for  time  intervals  x,  y.  □ 

Now  we  examine  how  the  ALLOC-REP(. . .)  messages  are  generated.  If  an  executable  object  whom  is 
requested  to  be  allocated,  and  it  verifies  itself  or  its  resources  to  be  non- schedulable,  then  there  is  no  point 
in  verifying  the  allocatability  of  its  resource  requirements.  It  generates  an  ALLOC -REP  (no,. . .)  message  to 
the  object  which  requested  its  service.  On  the  other  hand,  if  a  leaf  object  whom  is  requested  to  be  allocated, 
and  it  verifies  itself  and  its  resources  to  be  schedulable,  having  no  resource  requirements,  it  generates  an 
ALLOC-REP(yes,. . .)  message  to  the  object  which  requested  its  service. 

The  backward  wave  of  ALLOC-REP(. . .)  messages  propagates  in  the  following  way.  If  both  an  executable 
object  and  its  resource  requirements  have  been  found  schedulable  and  if  this  object  has  received  all  the  answers 
it  expected  with  a  positive  color,  then  it  sends  back  a  positive  answer  message  ALLOCJlEPfyes,. .  .,prev,. . .) 
to  the  object  that  had  requested  its  services.  Thus,  each  node  performs  a  boolean  AND  of  all  the  pos¬ 
itive  answers.  On  the  other  hand,  if  a  requesting  object  exhausts  all  the  alternatives  for  any  particular 
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service  request,  then  it  cannot  meet  its  requirements,  and  it  sends  back  a  negative  answer  message  AL- 
LOC-REP(no,. .  .,prev,. . .).  Since  in  the  latter  case  some  services  might  have  already  been  reserved  (in 
particular  the  object  itself  and  the  resources),  these  reservations  must  be  removed  to  release  them  for  other 
possible  requests. 

•  Upon  receiving  ALLOC JlEQfof, tag, from, whom,  temporal-redundancy  Jevel, TC,me)v. 

1.  Iterate  myJevel  successful  iterations,  trying  to  reserve  an  execution  interval  for  whom  in  its 
calendar,  and  for  its  resource  requirements  at  their  calendars.  The  number  of  iterations  is  bounded 
by  the  required  temporal  redundancy  level. 

2.  If  no  iteration  was  successful,  send  ALLOC  -REP  answering  no. 

3.  Otherwise,  if  whom  is  a  leaf-object  (having  no  service  requirements),  send  ALLOC -REP  answering 
yes,  indicating  how  many  missing  temporal  redundancy  instances  there  are  according  to  my  -level. 

4.  Otherwise  (not  being  a  leaf-object)  do  the  following  for  every  service  requirement  in  whom  depen¬ 
dency  set. 

-  Select  the  first  service  alternative  in  the  service  requirement. 

-  Send  ALLOC-REQ  for  allocating  the  selected  service  alternative,  asking  for  the  temporal 
redundancy  my  Jevel,  projecting  the  proper  time  constraint  according  to  the  temporal  relation 
between  whom  and  the  service. 

5.  Update  whom  joint  to  include  the  proper  information  needed  to  deal  with  replies. 

•  Upon  receiving  ALLOC -REPfcolor, of, tag, from,  whom,Alevel,  TC,me):: 

1.  If  the  color  is  yes,  and  all  the  required  temporal  redundancy  instances  have  been  allocated,  then 
mark  this  service  requirement  done. 

2.  Otherwise,  not  having  enough  temporal  redundancy  instances,  if  there  is  another  possible  service 
alternative  in  the  service  requirement,  do  the  following. 

-  Select  the  next  service  alternative  in  the  service  requirement. 

-  Send  ALLOC-REQ  for  allocating  the  selected  service  alternative,  requiring  the  unsatisfied 
temporal  redundancy  level  (up  to  my  Jevel),  projecting  the  proper  time  constraint  according 
to  the  temporal  relation  between  whom  and  the  service. 

3.  However,  if  there  are  no  more  service  alternatives  at  that  service  requirement,  the  following  two 
cases  are  distinguished. 

-  If  no  alternative  at  all  at  that  requirement  have  been  allocated,  then  send  ALLOC-REP 
answering  no  to  the  object  that  required  the  service  of  whom.  In  that  case  release  whom,  its 
resources,  and  the  rest  of  the  requirements. 

-  If  some  alternatives  at  that  requirement  have  been  allocated,  then  decrease  the  level  of  tem¬ 
poral  redundancy  viewed  by  whom,  to  the  lowest  between  its  current  view  and  the  view  seen 
by  from.  Then,  mark  this  requirement  as  done. 

4.  If  all  service  requirements  are  done,  send  an  ALLOC-REP  to  with  positive  answer  to  the  object 
that  required  the  service  of  whom,  indicating  the  level  of  temporal  redundancy  as  limited  by 
whom's  view  or  its  requirements. 
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We  note  the  way  in  which  the  degree  of  temporal  redundancy  is  maintained,  in  order  to  satisfy  Condition 
5.  The  temporal  redundancy  achieved  by  the  object  itself  and  ita  resources  is  bounded  by  the  one  requested 
from  the  service  requirements.  If  a  service  alternative  cannot  satisfy  the  degree  required  by  a  requestor 
object,  an  additional  alternative  is  invoked  to  satisfy  the  debt,  and  so  on  as  long  as  there  are  alternatives. 
The  sum  of  the  redundancy  achieved  by  the  alternatives  of  a  service  requirement  establishes  the  degree  of 
that  service  requirement.  The  lowest  degree  achieved  by  a  member  of  the  service  requirements  is  the  one 
reserved  and  the  requestor  is  informed  about  the  debt.  That  way  the  requestor  can  try  and  increase  the 
degree  by  requesting  another  alternative.  The  principle  here  is  to  use  a  physical  redundancy  when  no  more 
temporal  redundancy  can  be  achieved. 

3.4  Local  and  External  Variables 

In  the  algorithm  presented  here  we  use  some  of  the  variables  defined  for  the  joint  of  an  object  (see  Appendix 
A)  and  the  folloing  local  variables: 

myJevel:  the  degree  of  temporal  redundancy  of  this  object  so  far. 
myAlevel:  the  debt  in  temporal  redundancy  of  this  object. 

Alevel:  the  debt  in  temporal  redundancy  of  the  service  requirement. 

LM.O-K:  true  as  long  as  this  object’s  schedulability  is  not  contradicted. 

RJs.O _K:  true  as  long  as  these  resources’  scheduiability  is  not  contradicted. 

TCm, :  time  constraint  of  this  object. 

R,:  a  resource  requirement. 

the  temporal  relation  between  this  object  and  resource  requirement  Ri. 

TCi\  the  time  constraint  of  the  requirement  as  projected  from  TCme  using  the  temporal  relation 
Si :  a  service  requirement. 

RSt :  the  temporal  relation  between  this  object  and  service  requirement  5,. 
aj’*:  a  service  alternative  of  service  requirement  5,. 

(k,  n  are  the  number  of  requirements  for  resources  and  services,  respectively.) 

3.5  The  Allocation  Algorithm 

Upon  receiving  ALLOC ATE( from, whom,  T C, phy tied  J, eg, temporal. deg, me)-.'. 

begin 

Let  whom  be  associated  with  {pi, . . . ,  pa}  with  k  >  phyncal.deg  ; 

Construct  a  non-volatile  auxiliary  object  ROOT  with  the  following: 

1.  DSROOT,tcKoot  =  {  {R|BOOr)  :  1  <  .  <  Jfc}  {S<*OOT)  :  1  <  i  <  n}  } 

where  {Rt<ROOT)  :  1  <  i  <  k}  =  <j> 

and  S<*OOT>  =  {<  s\ROOr^,TC  >:  s<JtooT,(<)  €  {Pl,. !.,?*}  }. 

2.  Id  —  ROOT. 

3.  prev «—  from. 

4.  TCmt  -  TC. 

5.  myAlevel*— 0. 


23 


6.  *[»'] «-  for  1  <  »  <  physical-dtg  . 

7.  i4n«[»] «-  off,  for  1  <  t  <  phyncaljieg  . 
for  tag  «—  1  to  phyaical-deg  step  1  do 

send  ALLOC JtEQ(ROOT,tag, ROOT, s[ROOT)[tag]  .temporal.deg,  TC, allocator) 


Upon  receiving  ALLOC JlEQ(Id, tag,  from,  whom, temporal. dtq,  TC,me):: 

begin 

my  Jevel«— 0;  LM.O_K«— true; 

TCmt  construct  (u;/iom,  TC,  Id,  taj); 

TCme-level  «—myJevel;  TCma. state  «—  idle; 
while  (myJevel<temporaLdeg)A(LM-0 _K)  do 

/*  Temporal  redundancy  reservations  */ 
if  INSERT -TC(tuhom,  TCmt)  then 

/*  Reserve  necessary  resources  for  whom  */ 
i  <—  1;  R  is_0_K<— true  ; 
while  (t  <  fc)A(RJs.OJK)  do 

TC{  project  (Rp_,,TCmt)  ; 

TCi.level  *~myjevel ; 
if  INSERT.TC(Ri,  TC,)  then 
» ♦-  *  +  1  ; 

else  /*  cannot  get  them  all:  release  guaranteed  subset  */ 
RJs.O_K<— false  ; 
for  q  *-  1  to  t  step  1  do 

REMOVE-TC  ; 

od 

I-M-O.K*— false  ; 

REMOVE.TC  (whom,  TCmt)  ; 

a 

od  /*  resource  reservation  terminated  */ 

else 

I-M-O-K*— false  ; 

e 

if  (LM.O.K)  then 

myJevel*—  myJevel+1  ; 

TCme.level  *— myJevel  ; 

fl 

od 
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myAlevel*— myJevel— temporaLdeg  ; 
if  (myJevel=0)  then 

send  ALLOCJtEPfno,  Id,  tag,  whom,  from,mgAlevel,  TCme,  allocator)  ; 
else  /*  myJevel>  0,  something  was  reserved  */ 

if  (VSj  €  DS(whom]  :  Si  =  <^)  then  /*  leaf-object  */ 

send  ALLOC JUSP(yes,  Id,  tag,  whom,  from,myAUvel,  TCme,  allocator )  ; 
else  /*  non-leaf-object:  invoke  allocation  of  service  requirements  */ 
prev  ♦—  from  ; 
for  «  *-  1  to  n  step  1  do 

TCi  —  project  (^.TCm*)  ; 

send  ALLOC JlEQ(Id,  tag,  whom,  a^‘hom^'\myJevel,  TCi,  allocator)  ; 
a[»'I  <-  a[whom)[,)  ;  Xnsft]  —  off  ; 
od 

/*  In  case  allocator  is  reenterant:  store  in  whom  joint  * / 
store  Id,  prcv,  TCme,  my  Alevel,a(t  =  1, . . nj,  >4na[»  =  1,  ...,n]  ; 

fl  fl 

end 


26 


r 


Upon  receiving  ALLOC -REP (color,  Id,  tag,  from,  a^ahom^'\  Alevel,  TC,  me)" 
begin 

Restore  auxiliary  variables  according  to  Id,  tag,  a^ 

/*  prev,  TCme, my  Alevel,  s[t  =  1,  — ,  nj,  Anaft  =  1 . n]  •/ 

if  (color  =  yea)  then 
If  (Alevel=  0)  then 
Anaft] «—  done  ; 

else  /*  Alevel<  0  :  more  alternatives  are  needed,  some  already  reserved  */ 
Anaft]  «—  on  ; 

fi  fl 

if  (Alevel<  0)a(j  <  Af (•*«"»)(<))  then 
TCi  <—  project  (^s,,TCmej  j 

send  ALLOC-REQ(Id,  tag,  whom,  ej"*om>(i>,  -  A level,  TCi,  allocator)  ■ 
store  aft]  —  sj“^om,(,)  ; 

elseif  (Alevel<  0)a(;  =  =  on)  then 

my  Alevel*-  min(myAlevel,  Alevel)  ; 

Anaft]  <—  done  ; 

elseif  (Alevel<  0)a(>  =  M(-A®m)«))A(,4n3[;]  =  off)  then 

/*  no  alternative  reserved  release  other  requirements  */ 
send  UNLOAD  (whom,  TCmt)  ; 

/*  see  section  C.3  */ 

if  (Id  ^  whom)  then  /*  climb  up  to  try  again  */ 

send  ALLOC _REP(no,  Id,  tag,  whom,  prev, my  Alevel,  TCme,  allocator)  ; 
else  /*  ROOT  failed  to  be  allocated  */ 

send  ALLOC -REPfno,  ROOT,  tag,  aft],  ROOT, my  Alevel,  TCme,  prev)  ; 
fl 

else 

j*  error  in  algorithm  */ 
fl 

if(Vt  :  Ana[i]  =  done)  then 
if  (whom  Id)  then 

send  ALLOC -REP (yea,  Id,  tag,  whom,  from,  myAlevel,  TCme,  allocator)  ; 
else  /*  ROOT  is  properly  allocated  */ 

send  ALLOC-REP(yea,  ROOT,  tag,  ROOT,  ROOT, my  Alevel,  TCme,  prev)  ; 

/*  temporal  redundancy  debt  in  myAlevel  */ 
delete  ROOT; 

fl  fl 

end 
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4  Properties 

The  major  properties  of  the  algorithm  are  discussed  in  this  section:  termination  of  the  algorithm,  correctness 
of  allocatability  when  detected  by  the  algorithm,  the  achievement  of  the  fault  tolerance  objectives  when 
allocatability  is  confirmed,  the  mutual  exclusion  support  needed,  and  finally  the  deadlock  prevention. 

4.1  Algorithm  Termination 

We  start  examining  the  algorithm’s  properties  by  considering  the  possibility  of  a  non-terminating  allocation 
session,  in  cases  of  allocating  a  particular  object,  p,  with  a  finite  reachability  set.  The  finite  number  of 
members  in  p’s  dependency  set,  implies  that  there’s  a  finite  path  from  p  to  any  member  q  in  the  set,  and 
therefore  within  a  finite  time,  an  ALLOC  JIEQ  ( . . .)  message  sent  from  p  will  reach  q.  In  addition,  the  finite 
reachability  set  implies  a  finite  number  of  requirements.  Furthermore,  the  finite  reachability  set  implies  that 
every  path  is  either  finite,  or  a  close  component,  or  a  finite  path  ending  with  a  close  component.  Therefore, 
these  are  the  cases  we  examine  below. 

A  path  that  starts  at  p,  passes  through  one  of  its  requirements,  and  is  finite.  It  must  eventually  reach  a 
leaf-node  that  has  no  requirements.  There,  the  forward  wave  of  ALLOCJtEQ  (  ...)  messages  is  stopped, 
generating  an  ALLOC JtEP  ( ■■■)  message  that  returns  on  the  same  path  used  by  the  forward  wave. 

If  p  and  its  requirement  q  are  both  members  of  a  closed  component,  then  the  following  must  occur.  Object 
p  inserts  its  incoming  constraint  TCpw  into  its  calendar,  reserving  an  interval  PpW  within  this  allowed 
window  of  occurrence.  Then  object  p  projects  its  incoming  time  constraint  TCpW  into  a  constraint  TCq(t >). 
Object  q  then  inserts  PqW  into  its  calendar,  and  passes  the  request.  The  request  continues  and  returns 
back  to  p,  since  it  is  a  closed  component.  The  restriction  on  convergent  temporal  relations,  yields  that  the 
new  arrival  of  the  allocation  request  comes  with  a  time  constraint  TCp <i>  which  is  contained  within  TCpW . 
Now  p  reserves  another  time  interval  Pp< ,  which  is  of  the  same  duration  as  Pp( o>  and  is  definitely  contained 
within  TCp(o) .  The  same  argument  holds  for  the  following  occurrences  of  forwarding  the  ALLOC  JIEQ  (  ...) 
messages.  Note  that  the  finite  interval  TCp(.>  can  allow  only  a  finite  number  of  Pp(1)  intervals  to  be  reserved 
within  its  limits.  Once  this  finite  number  is  reached,  and  another  reservation  request  within  this  window 
of  occurrence  arrives,  p  is  not  schedulable  any  more  (both  preemptively  and  non-preemptively).  When  this 
case  occurs,  a  negative  reply  ALLOC JREP  (no,  .. .)  is  sent  back,  and  the  forward  wave  is  stopped. 

The  third  case  of  a  finite  path  that  ends  in  a  close  component  is  a  combination  of  a  finite  delivery  of 
ALLOCJtEQ  ( . . .)  followed  by  the  above  scenario. 

Due  to  the  above,  we  can  conclude  that  within  a  finite  time  the  forward  wave  terminates,  and  only 
backward  wave  messages  exist  in  the  allocation  session.  Since  each  backward  wave  message  uses  the  same 
path  its  corresponding  forward  wave  message  has  passed,  only  in  reverse  direction,  then  we  can  conclude 
that  within  a  finite  time  every  ALLOCJtEQ  (  . . .)  to  an  alternative  is  answered  by  either  a  positive  or 
a  negative  ALLOC  JtEP  Having  a  finite  number  of  alternatives  for  every  requirement,  and  a  finite 

number  of  requirements,  yields  the  conclusion  of  the  algorithm  within  a  finite  time. 

Proposition  1  The  allocation  algorithm  terminates  if  the  reachability  set  of  ROOT  is  finite.  O 

In  the  case  the  graph  is  infinite,  the  algorithm  also  terminates  due  to  an  implicit  timeout  mechanism  which 
is  attached  to  the  allocator.  The  dependence  of  allocatability  on  each  objects  and  resource  schedulability 
hides  this  timeout.  A  time  constraint  is  not  verified  to  be  schedulable  if  its  latest  begin  time  has  already 
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passed.  In  such  a  case  a  negative  reply,  ALLOCJIEP  (no,  ...)  ,  would  be  generated  and  forwarded  to  the 
originator.  Therefore,  we  can  state  the  following,  expanding  Proposition  1. 

Proposition  2  The  search  for  an  allocation  always  terminates. 

We  can  conclude  by  saying  that  since  the  algorithm  always  terminates,  either  by  a  normal  termination 
or  by  a  timeout,  the  answer  to  the  originator  will  be  generated. 

4.2  Allocatability  Correctness 

There  are  only  two  possible  “colors*  for  reply  messages:  a  positive  answer,  the  ALLOC JR.EP  (yes,  ...) 
message,  and  a  negative  answer,  the  ALLOC.  REP  (no,  .. .)  message.  The  negative  answer  can  be  generated 
in  two  cases.  The  first  is  the  case  of  a  non-schedulable  object  that  receives  an  ALLOC JIEQ  ( ...)  message, 
where  non-schedulability  refers  to  the  object  itself  or  to  one  of  its  resource  requirements.  The  second  is 
the  case  of  an  object  that  receives  ALLOC.  REP  (no,  . . .)  answers  from  all  its  alternatives  for  a  specific 
requirement,  ond  thus  is  known  not  to  satisfy  the  allocatability  condition.  The  positive  answer  is  generated 
in  the  ^.ase  of  a  schedulable  leaf  node  that  receives  an  ALLOCJIEQ  (  . . .)  message  and  immediately 
answers  with  a  ALLOCJIEP  (yes,  . . .)  message.  The  positive  answer  propagates  only  when  there  were 
positive  replies  from  all  the  service  requirements  of  an  intermediate  schedulable  node,  and  again  schedulable 
refers  to  the  object  itself  or  to  one  of  its  resource  requirements.  Thus  we  can  conclude  that  each  object 
that  sends  an  ALLOCJIEP  (yes,  . . .)  message  is  either  a  schedulable  leaf-node,  and  thus  allocatable,  or  a 
schedulable  intermediate  node  that  received  at  least  one  ALLOCJIEP  (yes,  . . .)  message  from  each  of  its 
service  requirements,  and  thus  is  allocatable.  Hence,  the  following  proposition. 

Proposition  S  A  positive  reply  from  the  allocator  of  ROOT,  ensures  the  existence  of  a  non-empty  allocation 
graph  of  ROOT.  □ 

4.3  Achievement  of  Fault  Tolerance  Objectives 

In  section  2.3,  we  have  defined  two  types  of  redundancy,  the  temporal  and  the  physical  redundancy.  Here 
we  expand  on  these  two  concepts,  and  on  their  relations. 

In  the  algorithm  presented  in  section  3,  every  request  for  allocating  an  object  specifies  the  temporal 
redundancy  level  required.  The  temporal  redundancy  level  propagates  with  some  restrictions.  Assuming 
there  is  no  reason  to  request  from  a  service  a  higher  temporal  redundancy  level  than  the  one  achieved  by 
the  requesting  object,  each  object  first  attempts  to  reach  the  required  level  itself.  If  it  succeeds,  the  request 
propagates  with  no  disturbance.  Otherwise  only  a  part  of  the  request  is  forwarded,  whose  sise  equals  the 
level  achieved  locally  (myJevel),  and  a  “debt*  of  the  sise 

myAlevel  =  myJevel  —  requiredlevel 

is  generated.  Note  that  due  to  this  definition  myAlevel  is  non-positive.  If  all  the  service  requirements  achieve 
the  redundancy  level  forwarded  to  them,  then  the  local  debt  (myAlevel)  is  reported  in  the  backward  wave 
message.  If,  however,  an  alternative  does  not  succeed  in  reaching  the  objectives  set  to  it  by  the  requestor, 
another  alternative  is  invoked  for  increasing  the  temporal  level.  This  alternative  invocations  continue  as  long 
as  the  temporal  redundancy  level  for  the  service  requirement  does  not  reach  the  level  achieved  locally,  and 
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u  long  as  there  are  alternatives.  If  there  are  no  more  alternatives,  the  local  level  is  reduced  to  the  lowest 
level  achieved  by  the  requirements,  and  the  increased  debt  is  reported  in  the  backward  wave  message. 

The  above  procedure  provides  the  following  result:  an  allccatable  object  that  answers  a  request  AL¬ 
LOC  JIEQ  ( ...,  temporal.deg,  . . .)  positively  with  an  ALLOC  .REP  (yes,  . . myAlevel,  . . .)  ,  has  reserved 
at  least  temporaLdeg+myAlevel  execution  instances  of  itself  and  its  resources,  and  its  service  requirements’ 
answers  reported  on  at  least  that  amount.  In  other  words,  temporaLdeg+myAlevel  distinct  allocatable 
instances  are  reserved.  Hence  the  following  proposition. 

Proposition  4  A  positive  reply  from  the  allocator  of  ROOT,  ensures  that  Condition  5  holds  to  satisfy  a 
resiliency  to  transient  faults  of  an  allocatable  ROOT  of 

temporaLdeg  +  myAlcvel(ROOT).  □ 

The  physical  redundancy  is  achieved  by  verifying  that  there  are  at  least  physical-deg  allocation  graphs 
which  do  not  intersect  each  other,  except  in  ROOT.  The  non-intersecting  nature  is  achieved  by  maintaining 
that  amount  of  distinct  tags  for  the  computation  allocated  Id.  The  INSERT.TC  function  that  is  used  to 
verify  it,  assumes  every  object  and  every  resource  have  a  calendar,  each  of  which  is  maintained  by  instances 
of  INSERT.TC  and  REMOVE.TC.  The  implementation  of  the  tags  separation  as  service  requirements  of 
ROOT  serves  two  goals.  First,  physical-deg  replies  with  different  tags  are  received  into  a  boolean  AND. 
Second,  in  case  one  alternative  fails,  another  one  can  be  chosen  for  an  allocation  retrial. 

Proposition  5  A  positive  reply  from  the  allocator  of  ROOT,  ensures  that  Condition  4  holds  to  satisfy  a 
resiliency  to  monotonic  faults  of  an  allocatable  ROOT  of  physical-deg.  □ 

4.4  Complexity 

In  a  wide  variety  of  cases,  real-time  computations  are  composed  of  a  set  of  objects  which  form  a  hierarchical 
structure.  This  hierarchy  is  depicted  by  allowing  each  object  to  be  a  member  of  at  most  one  dependency 
set.  On  the  other  hand,  each  resource  can  be  a  member  of  more  than  one  of  the  resource  requirement  sets 
in  the  computation  session.  This  yields  a  tree-like  structure  for  the  objects  which  are  not  resources.  Thus, 
in  a  graph  representation  of  the  computation,  each  of  these  objects  may  be  connected  to  all  the  resources. 

This  hierarchical  abstraction  is  widely  used  in  real-time  and  object  oriented  systems  due  to  the  autonomy 
it  provides.  This  autonomy  is  reached  by  the  encapsulation  of  functions  into  the  object  and  once  triggered 
the  object  has  no  need  for  additional  external  stimuli.  Therefore,  the  liveness  of  the  invoking  object  is  not 
a  necessary  condition  to  the  successful  completion  of  the  invokees. 

Using  the  above  graph  representation,  we  show  now  a  worst  case  analysis  of  the  algorithm  complexity.  We 
isolate  the  set  of  server  objects  S,  such  that  they  form  a  tree  structure.  The  algorithm  presented  in  section 
3  traverses  |S|  —  1  arcs  due  to  server  dependency.  Each  server,  in  turn,  may  require  all  the  resources  in  the 
resource  set  R.  Therefore,  the  algorithm  traverses  |5|  •  j/2|  arcs  in  the  graph  due  to  server  /resource  relations. 
These  are  the  only  possible  arcs  to  traverse  in  the  above  model.  Hence,  for  hierarchical  computations  the 
complexity  of  the  algorithm  is: 

|5|.|*|+|5|-1. 

The  time  complexity  is  degraded  if  computation  graphs  which  contradict  the  object  architecture  objectives 
are  used. 
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4.5  Mutual  Exclusion 

The  mutual  exclusion  of  object  utilisation  results  directly  from  the  use  of  calendars  in  each  of  the  (totally) 
independent  Service  Access  Points.  For  each  computation  session,  an  Id  and  a  tag  are  generated  in  order  to 
coordinate  the  different  requests  for  computation.  These  request  identifications  are  kept  in  the  calendar  of 
the  object,  from  the  algorithm  we  can  see  that  if  two  or  more  requests  come  to  a  specific  object  with  the 
same  time  constraints,  the  screening  of  the  one  that  will  acquire  that  window  is  done  at  allocation  time. 

All  the  requests  use  INSERT.  TC  and  REMO  VE- TC  to  update  the  calendars.  These  primitives  ensure,  in 
turn,  that  only  one  object  has  an  access  to  the  calendar  of  a  particular  requested  object  at  a  time.  Therefore, 
at  the  execution  time,  a  specific  constraint  is  reserved  to,  at  most,  one  server. 

Note  that  the  above  development  occurs  for  convex  time  intervals  as  well  as  for  non-convex  time  intervals. 
The  space  in  time  is  reserved  for  each  convex  sub-interval  which  is  a  member  of  the  non-convex  time  interval. 
For  periodic  jobs,  the  same  holds,  sufficing  to  reserve  the  time  constraint  for  each  new  occurance  of  the 
periodic  job  at  the  end  of  an  occurance. 

4.6  Deadlock 

Deadlock  avoidance  or  detection  is  automatic  in  the  allocation  scheme  we  have  presented  here.  Note  that 
in  this  section  we  are  referring  to  run-time  deadlocks,  as  opposed  to  allocation- time  deadlocks  which  have 
been  shown  to  be  avoided  in  section  4.1. 

Deadlock  occurs  when  there  exists  a  close  component  of  waiting  resources  in  a  computation  “wait-for” 
graph.  Thus,  for  any  deadlock  scenario,  the  wait  is  for  an  indefinite  period  of  time.  Such  a  case  is  not  possible 
in  the  allocation  scheme  presented  here,  because  at  run-time  objects  execution  is  carried  out  according  to  a 
certain  non-overlapping  time  ordering.  Therefore,  objects  are  prevented  from  waiting  for  a  resource  or  for 
a  message  from  another  object.  Instead,  all  the  resources  are  pre-allocated  and  the  invocation  mechanism 
uses  an  underlying  message  passing  mechanism  which  involves  no  waits.  Hence,  we  state  the  following. 

Proposition  6  There  are  no  run-time  deadlocks. 

5  Reallocation  Algorithm 

5.1  Rationale 

Let  the  system  resources,  denoted  as 

P  =  {Ru- ■■,Ek}  , 

be  connected  with  a  set  of  physical  communication  links  Lp  to  form  a  graph 

9p=  (P,£r)- 

The  dependency  set  of  every  object  p  in  the  system,  contains  a  resource  requirement  :  1  <  t  <  k}. 

such  that 

:  1  <  *  <  k}  C  P. 

Methods  have  been  suggested  to  partition  $p  into  clusters  of  resources  used  to  monitor  each  other  in 
order  to  detect  a  monotonic  failure.  Each  of  these  clusters  is  called  a  detection  unit,  denoted  Di,  and  the 
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participating  resources  are  assumed  to  communicate  with  other  through  a  detection  protocol  of  some  kind. 
Here  no  assumption  is  taken  about  the  detection  protocol.  However,  our  previous  assumption  on  keeping 
calendars  in  a  non-volatile  storage,  suggests  a  possibility  of  retrieving  the  guarantees  given  and  not  satisfied 
by  the  faulty  resource. 

Although  we  have  shown  that  the  resiliency  objectives  are  satisfied,  we  suggest  here  an  enhancement  to 
allow  recovery  of  the  resiliency  after  a  fault  occurs.  If  there  are  unused  resources  in  the  system  that  can 
support  the  continuation  of  the  execution  of  a  physical  redundancy  that  has  failed,  there  is  no  reason  for  not 
using  them.  A  reallocation  of  that  alternative  as  a  substitute  to  the  faulty  one  can  be  easily  implemented  with 
the  tools  described  above  for  the  allocation.  A  retrieval  of  the  calendar  of  the  faulty  resource  (or  object), 
allows  invoking  the  reallocation  with  a  negative  ALLOC _REP,  and  thus  triggering  the  search  of  another 
alternative.  If  such  an  alternative  is  found,  ROOT  (and  thus  the  owner  who  requested  the  computation)  is 
only  informed  about  the  recovery  via  a  positive  ALLOC -REP  message.  Otherwise,  ROOT  is  informed  with 
a  Alevel  that  results  from  the  fault. 

5.2  Algorithm  for  Detection  Unit  Z),- 

Upon  detecting  failure  (obj.sap:j  object,  TCin  rtimc. constraint)  :: 

begin 

inform  members  of  D.  ; 
retrieve  calendar{obj.eap)  ; 

VTCi  €  calendar(obj.eap)  :  do 

Get  auxiliary  variables  according  to  obj.sap  joint  ; 

/*  Idi,tagi,previ  are  restored  */ 
my  Alevel* - 1  ; 

send  ALLOC JtEP(no,  Id ,,  tan,  prcvt,  obj.sap,my&level,  TC<,  allocator)  ; 
od 
end 


6  Concluding  Remarks 

In  real-time  systems,  the  resource  management  plan,  the  allocation,  must  be  closely  related  to  the  schedul¬ 
ing,  and  the  two  are  based  on  time  considerations,  rather  then  on  a  static  priority  scheme.  The  allocation 
presented  here  is  fault  tolerance  motivated,  to  cope  with  the  applications  reliability  goals,  ensuring  a  user 
specified  resiliency  to  failures  while  supporting  both  temporal  redundancy  and  physical  redundancy  require¬ 
ments.  This  approach  allows  dealing  with  monotonic  faults  and  with  transient  faults  in  distinguished  manner. 

The  allocation  scheme  we  propose  here  accomplishes  the  hard  real-time  goal  of  guaranteeing  a  deadline 
satisfaction  in  case  the  job  is  accepted.  In  addition,  this  allocation  scheme  supports  fault  tolerance  objectives 
in  both  damage  containment  and  resiliency  requirements.  It  does  it  in  cooperation  with  a  schedulability 
verification  mechanism,  and  with  an  objects  architecture,  in  which  for  each  object  there  exists  a  calendar 
management  that  relates  time  to  its  execution.  A  nice  feature  of  this  scheme  is  the  way  in  which  it  can  be 
used  for  reallocation  while  increasing  the  resiliency  back  after  a  failure  occurred. 

The  model  employed  in  this  paper  has  considered  service  requirements  of  an  object  as  a  boolean  AND 


of  a  boolean  OR  of  alternatives.  It  has  been  done  in  an  alternative  selection  for  each  requirement,  while 
guaranteeing  that  all  requirements  are  served  before  committing.  However,  other  approaches  can  be  em¬ 
ployed  with  some  changes  in  the  algorithm,  to  support  different  relations  between  alternatives  according  to 
the  application.  Extending  the  relations  may  couple  alternatives  or  exclude  alternatives,  according  to  an 
alternative  chosen  at  another  service.  Another  possible  approach  can  be  the  use  of  OR  of  ANDs  instead  of 
the  proposed  AND  o»  GRs.  Various  approaches  are  planned  to  be  examined  in  a  project  of  a  hard  real-time 
operating  system  MARUTI  that  is  being  implemented  at  the  Computer  Science  Department,  University  of 
Maryland. 


A  Time  Constraints  and  Auxiliary  Variables  in  Object’s  Joint 

type  time  constraint  =  construct 

{  Id:  computation  identifier  ; 

tag:  thread  indicator  of  computation  Id  ; 
level:  redundancy  index  ; 
tc:  convex_time_interval  ; 
back-slack,  for-slack  :  real  ; 

P  :  non-Convex_timeJnterval ; 
freq  :  real ; 
state  :  integer  }  ; 

type  resource-requirement  =  construct 

{  Rr :  temporal  relation  ; 

f resource  ; 

R  :  non-convex-time .interval  }  ; 
type  service-alternative  =  construct 

{  R,:  temporal  relation  ; 

fobject_SAP  ; 

a  :  non-convex_time-interval  }  ; 
type  service-requirement  =  set  of  service-alternatives  j 
type  schedule  .type  —  (preemptive,  non-preemptive)  ; 
type  Answer. Wait_Indicator  =  (off,  on,  done)  ; 

var(obj.sap) 

calendar  :  ordered  set  of  time-constraints  ; 

Sched-Type  :  schedule  -type  ; 

dependency  .set  :  set  of  k  resource-requirements  and  n  service-requirements  ; 

aux-var(obj.sap) 

wait-set:  set  of  (objectJSAP  ; 

V  prev  e  wait  jet: 

Id  :  computation  identifier  ; 

tag  :  thread  indicator  of  computation  Id  ; 

TCm,  :  time-constraint  ; 
myAlevel :  redundancy  index  ; 
s(nj  :  set  of  n  service-requirements  ; 

An*[n]  :  set  of  n  Answer-Wait-Indicators  ; 


B  Detailed  Review  of  Allocation  Algorithms 

B.l  Bottleneck  Load  Minimization  Allocation  Algorithm 

Algorithm  P-I-A  (  PR,  IMC,  AET)  [2]  ; 


begin 

/•  Init  */ 

AET  -  $  £>:«i  Tj  ;  /*  Av  */ 

PL  «—  j  2/:* i  ^y  I  /*  Av  Processor  load  */ 

7/mc (t,j)  —  ,  1  IMC  index  */ 

lPR[i,j) «-  1  -  R(Pi,y)  ,  1  <  »'■><  ^  ;  /*  PR  in<i«  */ 

/*  Iterate  */ 

for  a  ♦—  ai  to  a?  step  Aa  do 

for  P  *-  P\  to  fa  step  A/J  do 

/*  PHASE  I:  combine  modules  with  high  IMC  * / 

/*  in  groups  to  reduce  sum  of  processor  loads  */ 

List  «—  Sort  (pi,p3)  pairs  in  descending  IMC  order  ; 

Gi  {Pj}  i  1  ^  J  <  ; 

while  List  ^  ^  do 

pop  (p«,py)  from  top  of  Z^tst  ; 

List  <-  List  -  {(p»,py)}  ; 

if  a  x  +'lPR(i,j)  >  0  then 

search  (s  :  p<  6  G,  ,  t :  p}-  €  G<)  ; 
if  (s  /  t)  A  (T.  +  T,  <  FI  x  P)  then  / 
G,  ♦—  G,  u  G«  ;  G«  ^  ; 

T,  *-T,  +Tt  \  Tt  —  Q  \ 
fl; 
fl; 

od  ; 

/*  PHASE  H:  assign  module  groups  to  processors  */ 

/*  and  exhaustively  search  for  smallest  BOTTLENECK  */ 

r  -  {G,  :  1  <  j  <  J)  ; 

Z{X)  -  mini {maxi<r<s{ AET(Pr ;  X)  +  IMC[Pr ;  X)}}  ; 
record  £{X}; 

od  ; 

od  ; 

end  ; 


combine  */ 
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B.2  Next-Fit-M  Allocation  Algorithm 

The  algorithm  in  [4]  uses  the  following  variables: 

•  Ti‘.  task,  1  <  i  <  n. 

•  u,:  utilisation  factor  (duty  cycle)  of  T{. 

•  Pij:  set  of  tasks  assigned  to  a  processor. 

•  Nk :  number  of  class-A:  processors  used  so  far. 
Algorithm  Next-Fit-M  ; 


begin 


end  ; 


for  k:=  1  to  M  step  1  do 

Nk:=  1  ; 

od  ; 

for  t:=  1  to  n  step  1  do 
k classify (2i)  ; 

/*  returns  k  for  2*^*  —  1  <  u,  <  2^  —  1,  for  1  <  k  <  M 
/*  returns  M  for  0  <  <  2^  -  1.  */ 

if  (1  <  k  <  M)  then 

Pk,Nh-=  Pk,Nk  u  {r<} ; 

if  =  *  then 

Nk:=  Nk  +  1  ; 

fl; 

else  /*  (k  =  M)  */ 

EtjZPu  u)  >  (hi  2  -  Ui)  then; 

Nu:=  -VM  +  1  ; 

fl; 

Pm.Nu'-  Pu,nu  U  {T,}  ; 


fl; 

Od  ; 
for  k:= 

1  to  M  step  1  do 

if  Pk,Nk  -  4>  then 

Nk:=  Nk 

fl; 

od  ; 
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B.3  Algorithm  for  Relocation  upon  Failure  Detection 

In  the  algorithms  presented  in  [6,5],  the  optimality  constraint  imposed  on  each  cluster  of  processors  is 

m,  <  c,  < 

where 

m,  =  Ci  •  (A  -  e),  Mi  =  C*  •  (A  +  «). 

The  workload  bounds  on  a  detection  unit  are  derived  accordingly  from 

M(Di)=  J2  m»- 

Vn:  />„££.  Vf»:  P.C-D, 

Each  Leader  maintains  the  following  items  in  order  to  answer  questions  of  other  Leaders  that  cannot  relocate 
in  their  own  detection  unit. 

•  M(A), 

•  m(A)» 

.  lVi. 

In  addition,  each  Leader  maintains  the  following  items  both  for  relocating  locally  (within  the  detection  unit) 
and  for  relocating  externally  (moving  processes  to  another  detection  unit). 

•  Di  =  {■Pi,  • . . ,  Pm}i 

•  Ri  the  root  of  the  subtree  of  Tp  corresponding  to  processors  in  D, , 

•  r<  the  root  of  the  subtree  of  tp  corresponding  to  processes  in  D, . 

B.8.1  Relocation  within  The  Detection  Unit 

Algorithm  Relocate  within  Dt  on  P,  Fault  ; 
begin  /*  m{Di)  -  m,  <  N%  <  M(D{)  -  M,  */ 

do 

1.  Update  Ri  to  reflect  P,  fault: 

1)  delete  node  Pj  from  tree  R:  ; 

2)  update  parent-nodes'  capacities  in  processor  cluster  tree  ; 

2.  Update  optimality  bounds: 

1)  M(Di):=  M(Di)  -  M}; 

2)  m(Di):=  m{Di)  -  my; 

3.  Invoke  ALLOC ATE(r<,  Ri)  ; 

od  ; 

end  ; 
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B.S.2  Algorithm  for  Relocation  to  Another  Detection  Unit 

Algorithm  Relocate  Externally  to  Di  on  Py  Fault  ; 
begin  /*  Ni  >  M(Di)  —  My  */ 

LEADER(A)  do 

1.  Notify  VP„  6  Di  on  Py  fault: 

2.  Update  Ri  to  reflect  Py  fault: 

1)  delete  node  Py  from  tree  Py  ; 

2)  update  parent-nodes’  capacities  in  processor  duster  tree  ; 

3.  Update  optimality  bounds: 

1)  M(Di):=  M(Di)  -  My; 

2)  m(Di):=  m(Di)  -  my; 

4.  Collect  network  status: 

1)  Vn  ?£  t,  n  €  1 . .  -  fc  Send  status  request  to  Leader(D„)  ; 

2)  Vn  ^  i,  n  €  1 . .  .k  collect  answers  (  n,  Nn,  M(Dn),  m(Dn)  ) 

5.  Ensure  global  balancing  constraint: 

E*=1  rft  <  E?-1  M(Di)  ; 

6.  Generate  candidate  set  C  ; 

7.  Rank(C)  ; 

8.  Select  highest  ranked  (v',1:*),  and  migrate  t>*  to  D*.  ; 

9.  Reflect  migration  and  new  relocation: 

1)  Ni-.=  Ni-W(v‘)  ; 

2) IV*.:=IV*.  +  W(u*); 

3)  Delete  v'  from  r,-  and  update  ancestors’  capacities  ; 

4)  if  Ni  >  M(Di)  then  goto  step  8. 

/*  iterate  until  done  */ 

10.  Verify  that  the  relocation  holds: 

Invoke  ALLOCATE(r,,  P,)  ; 

od  ; 

LEADER(Dfc^i)  do 

1.  Fetch  the  already  allocated  S  and  append  the  incoming  processes: 

S:=  S  ©  {pi,...,p.}  ; 

2.  r*.:=  cluster(S)  ; 

3.  ALLOCATE (r*.,P*.)  ; 

4.  IV*.  :=  IV*.  +  a  ; 

od  ; 

end  ; 
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B.4  Heuristic  Scheduling  Algorithm 

The  following  algorithm  is  used  in  the  Spring  Operating  System,  [16]. 

Procedure  scheduler  (task  jet:  task  jet-type;  var  schedule:  schedule-type;  var  schedulable:  boolean) 
/*  task-set  is  the  set  of  tasks  to  be  scheduled.  */ 
var  EAT’,  EATe:  vector-type; 

/*  Resources  earliest  available  times  in  share  and  exclusive  modes.  */ 


begin 

schedule  «—  4>  ; 
schedulable  <—  true  ; 

EAT'  «-  EATe  —  0  ; 

while  ((task  jet^  <f>)  A  schedulable  )  bf  do  begin 
VT  €  task  Jet:  calculate  Tc.t  ; 
if  -istrongly-feasible(taskjet,  schedule)  then 
schedulable  *—  false  ; 
else  begin 

VT  etask  jet:  apply  function  H  ; 

T  *-  T :  minvre»a.fc.»««(  H(T))  ; 

T,.t  i 

task  jet  =  taskjet  -{  T  }  ; 
schedule  «—  append(schedule,  T  )  ; 
calculate  new  values  for  EAT '  and  EAT e  ; 

end 

end 

end; 
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C  Scheduling  Functions  Used  by  The  Fault- Tolerant  Allocation 

C.l  Inserting  Time  Constraint  into  Object’s  Calendar 

type  schedule-type  =  (preemptive,  non-preemptive)  ; 


i 

l 

! 

I 

/ 


boolean  function  INSERT.TC  (obj.sap^object, 

TCm:timejConstraint)  ; 

varAlready  .Accepted  :  set  of  time-constraints  ; 

Sched.Type  :  schedule  Jype  ; 
constraint  :  time-constraint  ; 

begin 

lock  calendar(obj.sap)  ; 

/*  Already -Accepted  *—  calendar(obj.sap)  */ 

/*  Sched.Type  «—  obj.sap  scheduler  type  * / 
if  (3  constraint  e  Already  .Accepted: 

TCin. Id  =  constraint. Id  A 

TCm. tag  =/t  constraint. tag  ) 

then  /*  prevent  computation  connectivity  reduction  */ 
result  *—  false  ; 
else 

result  *—  scheduler. PUSH_TC(TCin, Sched.Type)  ; 

r  [si  v 

if  (result)A(r<7,„ .levels  1) 

then  V  constraint  €  Already  .Accepted  |  constraint.  Id=TC<n.  Id  A  constraint.  tag=TCi„.  tag  : 
setup  consistency  control  (see  (3,12|)  to  obey 

1.  identical  non-determinism  resolution,  and 

2.  identical  order  of  servicing  input  requests, 
fl 

fl 

unlock  calendar  (obj.sap)  ; 
retum(result)  ; 
end 


40 


C.2  Removing  Time  Constraint  from  Object’s  Calendar 

type  time-constraint  =  construct 

{  Id:  computation  identifier  ; 

tag:  thread  indicator  of  computation  Id  ; 
level:  redundancy  index  ; 
tc:  convex-time -interval  ; 
back -slack,  for  .slack  :  real ; 

P  :  non.convex-time-interval ; 
freq  :  real ; 
state  :  integer  }  ; 

boolean  function  scheduler.REMOVE-TC  (obj.sap:f object,  TCj„:time_constraint)  ; 

var  Already  .Accepted  :  set  of  time-constraints  ; 
constraint  :  time-constraint  ; 

begin 

lock  calendar(obj.sap)  ; 

l*  Already -Accepted  <—  calendar(obj.sap)  */ 
if  (3  constraint  6  Already -Accepted:  TC<n  =  constraint  ) 
then 

V  constraint  6  Already-Accepted  j  constraint. Id=rC,<„. Id  A  constraint. tag=rCin. tag 
rearrange  consistency  control  (see  section  C.I  and  [3,12])  ; 
Already-Accepted  ♦—  Already-Accepted  —{TCin}  ; 
result  «—  true  ; 

else 

result  <—  false  ; 

fl 

unlock  calendar(obj.sap)  ; 
return  (result)  ; 

end 
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C.3  Loading  and  Unloading  Time  Constraint  in  Object’s  Calendar 

type  time-constraint  =  construct 

{  Id:  computation  identifier  ; 

tag:  thread  indicator  of  computation  Id  ; 
level:  redundancy  index  ; 
tc:  convex-timeJnterval ; 
back-slaci,  for_slack  :  real  ; 

P  :  non.convex_time_interval ; 
freq  :  real  ; 
state  :  integer  }  ; 

type  resource.requirement  =  construct 

{  Zr:  temporal  relation  ; 

t resource  ; 

R  :  non.convex-timeJnterval  }  ; 
type  service-alternative  =  construct 

{  Z,:  temporal  relation  ; 

|object_SAP  ; 

a  :  non.convex-timeJnterval  }  ; 
type  service-requirement  =  set  of  service-alternatives  ; 
type  acheduleJype  =  (preemptive,  non-pre«mptive)  ; 
type  Answer- WaitJndicator  =  (off,  on,  done)  ; 

var(obj.sap) 

calendar  :  ordered  set  of  time-constraints  ; 

Sched.Type  :  acheduleJype  ; 

dependency -set  :  set  of  k  resource-requirements  and  n  service-requirements  ; 
aux-var(obj.sap) 

wait_set:  set  of  t  object -SAP  ; 

V  prev  €  wait  jet: 

Id  :  computation  identifier  ; 

tag  :  thread  indicator  of  computation  Id  ; 

TCme  :  time-constraint  ; 
my  A  level :  redundancy  index  ; 
s[n)  :  set  of  n  service-requirements  ; 

Ana[n]  :  set  of  n  Answer. Wait-Indicators  ; 

local  var 

Already-Accepted  :  set  of  time-constraints  ; 
constraint  :  time-constraint  ; 
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Upon  receiving  LOAD  (obj.sap:Tobject,  TC7j„:tunejConstraint)  :: 

begin 

using  joini(obj.sap): 

Let  DSobj.tap.rc  =  {{<Ri,  TCRi  >  :  1  <  *  <  fc}  {S,  :  1  <  *  <  n}  } 
where  5<  =  {<  ,  TCSi  > :  1  <  ;  <  M<*>}  . 

for  r  «—  1  to  k  step  1  do 

TCRt  *—  project(^jj,,  TC%n)  ; 

CHANGE.TC.STATE  {  (Rr,TCRr), active  )  ; 
od 

CHANGE.TC-STATE  (  (obj. sap, TC,*), active  )  ; 
for  r  *—  1  to  n  step  1  do 

TCs,  *  project's, ,  TC{n)  ; 

for  q  <—  1  to  (J:  J  <  A  a[r]  =  a^)  step  1  do 
LOAD(  s[r\TCs,  )  ; 
od 
od 


end 


Upon  receiving  UNLOAD  (obj.aap:|object,  rC^itime-Constraint)  :: 


begin 


end 


using  joint(obj.sap): 

Let  DSo6j..op,rc  =  {{<*,  TCRi  >  :  1  <  i  <  k)  {Si  :  1  <  i  <  n}  } 
where  S<  =  {<  a^.TCs,  >  :  1  <j<  M<‘>}  . 
for  r  «—  1  to  k  step  1  do 

TCr,  «— project  (£#,.  ,  TCin)  ; 

REMOVE.TC  {Rr,TCRr}  ; 
od 

REMOVE.TC  (obj. sap, TCjn)  ; 
for  r  *-  1  to  n  step  1  do 

TCSr  *-project(^srirC,n)  ; 

for  q  —  1  to  (J:  J  <  M ^  A  a[r]  =  3yr*)  step  1  do 
UNLOAD (  aJr),rCSt  )  ; 
od 


43 


References 


[1]  Agrawala  A.  K.  and  Levi  S.-T.,  Objects  Architecture  for  Real-Time,  Distributed,  Fault  Tolerant  Operating 
Systems,  IEEE  Workshop  on  Real-Time  Operating  Systems,  Cambridge  MA,  July  1987. 

[2]  Chn  W.  W.  and  Lan  L.  M-T.,  Task  Allocation  and  Precedence  Relations  for  Distributed  Real-Time 
Systems,  IEEE  Trans  on  Computers,  Vol  C-36  No  8  pp  667-679,  June  1987. 

[3]  Cooper  E.  C.,  Replicated  Procedure  Call,  ACM  Operating  Systems  Review,  Vol  20  No  1  pp  44-55,  Jan 
1986. 

[4]  Davari  S.  and  Dhal  S.  K.,  An  On-Line  Algorithm  for  Real-Time  Task  Allocation,  Proceedings  of  Real- 
Time  Systems  Symposium  (IEEE),  pp  194-199,  December  2-4,  1986,  New  Orleans,  LA. 

[5]  Ferguson  D.,  Kar  G.,  Leitner  G.  and  Nikolaon  C.,  Relocating  Processes  in  Distributed  Computer  Systems, 
IEEE  Proceedings  of  the  Fifth  Symposium  on  Reliability  in  Distributed  Software  and  Database  Systems, 
pp  171-177,  January  1986,  Los  Angeles,  CA. 

[6]  Kar  G.,  Nikolaon  C.,  and  Reif  J.,  Assigning  Processes  to  Processors :  A  Fault  Tolerant  Approach,  Pro¬ 
ceedings  of  14th  International  Conference  on  Fault  Tolerant  Computing  Systems  (FTCS),  pp  306-309, 
June  1984,  Kissimmee,  FA. 

[7]  Levi  S.-T.  and  Agrawala  A.  K.,  Objects  Architecture:  A  Comprehensive  Design  Approach  for  Real-Time, 
Distributed,  Fault- Tolerant,  Reactive  Operating  Systems,  CS-TR-1915,  Technical  Report,  Department 
of  Computer  Science,  University  of  Maryland,  College  Park,  Maryland,  September  1987. 

[8]  Levi  S.-T.  and  Agrawala  A.  K.,  Temporal  Relations  and  Structures  in  Real-  Time  Operating  Systems,  CS- 
TR-1954,  Technical  Report,  Department  of  Computer  Science,  University  of  Maryland,  College  Park, 
Maryland,  December  1987. 

[9]  Liu  C.  L.  and  Layland  J.  W.,  Scheduling  Algorithms  for  Multiprogramming  in  Hard  Real-Time  Environ¬ 
ment,  Journal  of  the  ACM,  Vol  20  No  1  pp  46-61,  Jan  1973. 

[10]  Ma  R.  P.,  Lee  E.,  Tsuchiya  M.,  Design  of  Task  Allocation  Scheme  for  Time  Critical  Applications,  Real 
Time  Systems  Symposium  (IEEE),  Miami  Beach  FA,  Dec  1981. 

[11]  Ma  R.  P.,  Lee  E.,  Tsuchiya  M.,  A  Task  Allocation  Model  for  Distributed  Computing  Systems,  IEEE 
Transactions  on  Computers,  Vol  C-31  No  1,  Jan  1982. 

[12]  Mancini  L.,  Modular  Redundancy  in  a  Message  Passing  System,  IEEE  Trans,  on  Software  Engineering, 
Vol  SE-12  No  1  pp  79-86,  Jan  1986. 

[13]  Mok  A.  K.  and  Dertousos  M.  L.,  Multiprocessor  Scheduling  in  A  Hard  Real-Time  Environment,  Proc¬ 
eedings  of  the  Seventh  Texas  Conference  on  Computing  Systems,  pp  5.1-5.12,  October  30  -  November 
1,  1978,  Houston,  Texas. 

[14]  Mok  A.,  Fundamental  Design  Problems  for  the  Hard  Real  Time  Environment,  MIT  Ph.D.  Dissertation, 
Cambridge  MA,  May  1983. 


44 


[15]  Stankovic  J.  A.  (editor),  Real-Time  Computing  Systems:  The  Next  Generation,  March  1987  CMTJ  Work¬ 
shop  on  Fundamental  Issues  in  Distributed  Real-Time  Systems,  Carnegie  Mellon  University,  Pittsburgh, 
Pennsylvania,  November  23,  1987. 

[16]  Stankovic  J.  A.  and  Ramamrithan  K.,  The  Design  of  the  Spring  Kernel,  Proceedings  of  Real-Time 
Systems  Symposium,  pp  146-157,  San  Jose,  California,  December  1987. 

[17]  Zhao  W.,  Ramamrithan  K.  and  Stankovic  J.,  Scheduling  Tasks  with  Resource  Requirements  tn  Hard 
Real-Time  Systems,  IEEE  Trans  on  Software  Engineering,  Vol  SE-13  No  5  pp  564-577,  May,  1987. 

[18j  Zhao  W.,  Ramamrithan  K.  and  Stankovic  J.,  Preemptive  Scheduling  under  Time  and  Resource  Con¬ 
straints,  IEEE  Trans  on  Computers,  Vol  C-36  No  8  pp  949-960,  August,  1987. 


la  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE 


lb  RESTRICTIVE  MARKINGS 


2b  DECLASSIFICATION  /  DOWNGRADING  SCHEDULE 
M  /& 


4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 
UMIACS-TR-  88-  32 
CS-TR-2018 


3  DISTRIBUTION  /AVAILABILITY  OF  REPORT 
approved  for  public  release; 
distribution  unlimited. 


S  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6b  OFFICE  SYMBOL 
(If  epplkeble) 


ffice  of 
aval  Researc 


6c  ADDRESS  (Cty.  Seat*,  end  ZIP  Cod*) 

Department  of  Computer  Science 
University  of  Maryland 
lleee  Park.  MD  20742 


8a.  NAME  OF  FUNDING  /SPONSORING 
ORGANIZATION 


8b.  OFFICE  SYMBOL  9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
(If  epplkeble)  I 

DASG60-87-C-0066  N0001 4-87-0241 


10  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM 


Be.  ADDRESS  (City.  Start,  and  ZIP  Cod*) 


11  TITLE  (Include  Security  Cleuificetion) 

Allocation  of  Real-Time  Computations  under  Fault  Tolerance  Constraints. 


PROJECT 

TASK 

NO. 

NO 

WORK  UNIT 
ACCESSION  NO 


12  PERSONAL  AUTHOR(S)  y 

Shem-Tov  Levi.  Daniel  Mosse,  and  Ashok  K.  Agrawala 


13b  TIME  COVERED  14  DATE  OF  REPORT  (Yeer.  Month.  Dey) 

«OM _ TO _  May  3,  1988 


i6  supplementary  notation 


K«1  MAXI*, 


COSATi  CODES 


GROUP  SUB-GROUP 


18  SUBJECT  terms  {Continue  on  revert*  if  necessery  end  identify  by  block  number) 


19  ABSTRACT  (Continue  on  revert*  if  rtecesury  end  identify  by  block  number) 

Allocation  of  resource*  in  "next-generation*  real-tuna  operating  ayitam*  require*  tome  important 
feature*  in  addition  to  tboae  demonstrated  by  currant  ayitama,  molting  in  an  increased  complexity  of  each 
•yitam.  The  allocation  1*  doaaly  related  to  th*  ached  oling,  and  th*  two  are  ba*ed  on  time  consideration*, 
rather  than  on  a  static  priority  *ch*m*.  Th*  allocation  U  fanlt  tolerance  motivated,  to  cope  with  the 
application’*  reliability  goal*.  Distributed  system  i**na*  and  adaptive  behavior  requirement!  increase  the 
complexity  and  significance  of  th*  allocation  approach. 

Th*  allocation  scheme  we  propose  her*  accomplish**  th*  hard  real-time  goal  of  guaranteeing  a  deadline 
eatisfication  In  case  th*  job  in  accepted.  In  addition,  this  allocation  scheme  supports  fanlt  tolerance 
objective*  la  both  damage  containment  and  resiliency  requirements.  It  does  this  in  cooperation  with 
a  schednlability  verification  mechanism,  and  with  an  object  architecture  in  which  for  each  object  there 
exists  a  calender  that  maintains  th*  time  of  its  execution.  A  nice  feature  of  this  scheme  is  th*  way  In 
which  ft  can  be  used  for  reallocation  while  increasing  th*  resiliency. 

Keywords)  real-time  operating  systems,  real-time  resource  management. 


20  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 

OUNCLASSIFIEQ/UNLIMITED  □  SAME  AS  RPT  □  OTIC  USERS 


22*  NAME  OF  RESPONSIBLE  INDIVIDUAL  |22b  TELEPh 

Ashok  K.  Agrawala  I  301- 


DD  FORM  1473, 84  MAR  83  APR  edition  m«y  be  used  until  exhausted 

All  other  editions  ere  obsolete. 


21.  ABSTRACT  SECURITY  CLASSIFICATION 

I FLED  _ 


22b  TELEPHONE  (Include  Are*  Code)  f  22c  OFFICE  SYMBOL 


SECURITY  CLASSIFICATION  QF  THIS  PAGE 


UNCLASSIFIED 


